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Preface 



The fourth international conference on Scientific Computing in Electrical En- 
gineering (SCEE) was held at the Eindhoven University of Technology, from 
23rd to 28th June, 2002. It was sponsored by Philips Research Laborato- 
ries Eindhoven, the Eindhoven University of Technology, Computer Simula- 
tion Technology (CST) from Darmstadt, ABB Corporate Research, Thales 
Netherlands, the European Consortium for Mathematics in Industry (ECMI), 
the University of Rostock (organiser of SCEE-2000), the European network 
for Mathematics, Computing and Simulation for Industry (MACSI-net), the 
Royal Netherlands Academy of Arts and Sciences (KNAW), and the Scien- 
tific Computing Group of the Eindhoven University of Technology. 

The Program Committee consisted of: 

— Dr. Alain Bossavit, Electricite de France, Clamart, France. 

— Dr. Uwe Feldmann, Infineon Technologies A.G., Munich, Germany. 

— Prof.Dr. Leszek Demkowicz, University of Texas at Austin, USA. 

— Dr. Michael Gunther, Universitat Karlsruhe, Germany. 

— Prof.Dr. Ulrich Langer, Johannes Kepler Universitat, Linz, Austria. 

— Dr. Jan ter Maten, Philips Research Laboratories Eindhoven, The Nether- 
lands. 

— Prof.Dr. Ursula van Rienen, Universitat Rostock, Germany. 

— Prof.Dr. Jaijeet Roychowdhury, University of Minnesota, USA. 

— Prof.Dr. Wil Schilders, Technische Universiteit Eindhoven and Philips 
Research Laboratories Eindhoven, The Netherlands. 

— Prof.Dr. Thomas Weiland, Technische Universitat Darmstadt, Germany. 

As on all previous occasions there was widespread support both from the 
industrial sector and academia. This is considered to be an essential feature 
of this series of conferences, because it guarantees the relevance of the work 
to practical situations, while at the same time ensuring that long term ba- 
sic research is strongly represented. For this reason, the interaction between 
electrical or electronic engineers and the mathematics community is one of 
the main aims of the SCEE conferences. Computational electromagnetics, 
computational electrodynamics, circuit simulation and coupled problems are 
areas that are covered. 

At SCEE-2002 Invited Speakers were: 

— Prof. Antonio Di Carlo (Universita degli Studi ’’Roma Tre”, Italy): ”G. 
Lame versus J.C. Maxwell: how to reconcile them?” 

— Prof.Dr. Susan C. Hagness (University of Wisconsin-Madison, USA): 
’’Frontiers in FDTD theory and applications”. 

— Dr. Patrick Joly (INRIA, Le Chesnay, France): ’’Variational methods for 
FDTD computations in time dependent electromagnetism” . 
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— Dr. Tom A.M. Kevenaar (Philips Research Laboratories, Eindhoven, The 
Netherlands): ’’Methods and approaches for RF circuit simulation and 
electromagnetic modelling” . 

— Dr.-Ing. Rolf Schuhmann (TU Darmstadt, Germany): ’’Recent advances 
in Finite Integration Technique for high frequency applications” . 

— Dr.-Ing. habil. Peter Schwarz (Fraunhofer-Institut fiir Integrierte Schal- 
tungen, Dresden, Germany): ’’Continuous simulation of coupled systems” . 

— Prof.Dr. Igor Tsukerman (University of Acron, Ohio, USA): ’’Generalized 
finite element method in electromagnetic analysis: benefits and hurdles”. 

In total there were 29 contributed oral presentations and 38 poster presenta- 
tions. 

It has always been the policy of these conferences to encourage participants 
from all countries, with an emphasis on Europe. On this occasion this has 
been remarkably successful, there were 111 participants from 13 countries. 
Thus, the series of SCEE conferences has grown into truly international 
events. This will continue to be the policy of the series. 

A new feature of the SCEE-2002 conference was the Industry Day, organized 
on Tuesday, June 25th. This was a very successful event, with 8 renowned 
speakers from industry and academia, discussing the needs of the electronics 
industry, now and in future: 

— Prof.Dr. Robert M.M. Mattheij, Technische Universiteit Eindhoven, The 
Netherlands. 

— Ir. Gerard F.M. Beenker, Philips Research Laboratories Eindhoven, The 
Netherlands. 

— Dr. Frank Demming- Janssen, Computer Simulation Technologies (GST), 
Darmstadt, Germany. 

— Dr. Wilhelm Durr, Siemens Medical Solutions, Erlangen, Germany. 

— Dr. Koen van Eijk, Magma Design Automation Inc, Eindhoven, The 
Netherlands. 

— Dr. Marc Swinnen, Sequence Design, Paris, France. 

— Prof.Dr. A. Peter M. Zwamborn, TNO Physics and Electronic Labora- 
tory, The Hague, The Netherlands. 

— Dr. Isaac Shpantzer, CeLight, Silver Spring, MD., USA. 

The Industry Day meant a break away from technical detail, with talks pre- 
senting a bird’s eyes view of scientific computing in electrical engineering. 
Besides the registered conference participants, the Industry Day attracted 
quite a number of additional participants from industry. 

Another new feature of the conference was the short oral introduction of 
posters. In two one-hour sessions, contributors of posters were given a maxi- 
mum of 2 minutes to advertise their work. The general feeling was that this 
was a very successful initiative, to be continued in future SCEE conferences. 
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The papers appearing here fall into two categories. The first are the invited 
papers given by the keynote speakers. Secondly, the Program Committee 
carefully refereed the contributed papers. The latter section contains both 
the accepted papers presented orally at the conference, as well as the papers 
corresponding to the posters. 

It is a pleasure to thank all of the people and instances, both named here 
and others, whose enthusiasm and hard work ensured the success of this con- 
ference SCEE-2002. 



Eindhoven, November 2003 

Wilhelmus H.A. Schilders 
E. Jan W. ter Maten 
Stephan H.M.J. Houben 
Local Organising Committee SCEE-2002 
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G. Lame vs. J.C. Maxwell: How to Reconcile 
Them? * 



Antonio DiCarlo 

Universita degli Studi “Roma Tre” , 

Mathematical Structures of Materials Physics at DiS, 
Via Vito Volterra, 62 1-00146 Roma, Italy 



Abstract. Nowadays, after more than a century of inconsiderate divergence be- 
tween electromagnetic and mechanical field theories, we find it hard to bring them 
together. This can be best exemplified by the problematic status of the electro- 
dynamics of deformable media. The blame can be laid mainly on the limitations 
of the underlying theoretical frameworks and on the practitioners’ education, too 
narrow to bridge the gap between them. I would like to concentrate here on the first 
problem — even though I am convinced that the second one carries more weight. 



1 Lame’s Treatise on Three-Dimensional Elastic Solids 

I happen to have a copy of the second edition of Gabriel Lame’s “Legons 
sur la Theorie Mathematique de I’Elasticite des Corps Solides” [1] on my 
bookshelf. It was published in 1866. Its first edition had been published in 
1852. Compare with the dates of the electromagnetic trilogy by Maxwell: “On 
Faraday’s lines of force” appeared in 1856, “On the physical lines of force” 
in 1861-2, and “A dynamical theory of the electromagnetic field” in 1865. 

These lecture notes on the mathematical theory of elastic solids differ 
strongly from any present-day book with a similar title. I do not mean by 
this simply that its notions and notations are somewhat outdated, or its 
mathematics obsolete, which would be quite trivial. The main difference is 
that Lame was much bolder and more oriented towards fundamental physics 
than any of his modern followers in elasticity or solid mechanics. Secondly, 
and as a consequence, his main aim was to study the vibrations and waves of 
a peculiarly thin and outlandish solid medium: the all-pervading elastic ether. 



1.1 Mathematical Physics as a New Science 

In the terse preface to the first edition. Lame held the opinion that — at the 
moment of his writing — the only well-founded chapters of proper Mathe- 
matical Physics were the electrostatics of conducting bodies, the theory of 
heat conduction and the mathematical theory of three-dimensional elastic- 
ity. Insisting that these creations — contrary to Rational Mechanics of old — 
belonged exclusively to his own century. Lame was unfair to the “Geometers” 

* Invited paper at SCEE-2002 
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of the Baroque period (Euler and the Bernoullis, to name the most prominent 
ones) , at least as much as a quantum physicist of the twentieth century could 
be to him and his contemporaries in elasticity. 

Elasticity — the most difficult and the least developed of the three chap- 
ters of Mathematical Physics — was also the most useful, he emphasized. It is 
thought-provoking to read that Lame already viewed that his own time was 
willing to assess the importance of a mathematical theory through the imme- 
diate benefits it could provide industrial practice. We could with all the more 
reason appropriate his judgment (think of our “Industry Day”), even though 
we would not put elasticity at the top of our list of most useful theories. 
However, Lame did not focus on the engineering side of elasticity — the most 
prominent nowadays, mainstream physics having repudiated it long ago. 

1.2 Surrogate Sciences vs. Rational Physics 

In Lame’s eyes, novel Mathematical Physics — while far deeper and wider in 
scope than older Rational Mechanics — shared its standards of rigour and pre- 
cision. This quality differentiated Rational Physics from the host of empirical 
treatments based on doubtful principles and ad hoc hypotheses, whose only 
merit was practicality, and whose function was essentially provisional: 

Malgre leur utilite actuelle, qui est incontestable, toutes ces theories 
empiriques et partielles ne sont que des sciences d’attente. Leur regne 
est essentiellement passager, interimaire. II durera jusqu’a ce que la 
Physique rationnelle puisse envahir leur domaine. 

I cannot be as confident as Lame in the final victory of the one Rational 
Physics over the myriad of special, empirical theories devised to cope with 
practical problems for which a truly scientific treatment is not yet available. 
This is for two reasons: firstly, I know that make-do sciences actively repro- 
duce, after swallowing morsels of true science; secondly, I doubt whether there 
is only one Rational Physics. However, I find his lucid depiction of the subtle 
rivalry between technical and scientific thinking in the age of technology a 
great contribution from him, and I stand on his side in this confrontation. 



1.3 Elasticity and Molecular Mechanics 

At variance with modern textbooks in continuum mechanics, elasticity is 
defined by Lame in terms of molecular interactions, even though in a rather 
vague way. In the very first paragraph of Chap. 1, Sect. 1, Lame calls elastic 
the restoring forces which tend to bring molecules back to their equilibrium 
positions. Notice that Lame did not use his naive molecular picture as a 
convenient pedagogic cartoon. Strange as it may appear to us, he and other 
founding fathers of continuum physics strongly believed in the necessity of 
an underlying discrete structure of matter. Lame criticized Navier’s method 
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for establishing the general equations of three-dimensional elasticity, on the 
grounds that it presupposed matter to be continuous, which he considered 
flatly absurd and inadmissible. 

The subtle role played in the nineteenth century by the molecular inter- 
pretation of elasticity is best expounded in the masterly historical account of 
structural mechanics by Benvenuto [2, Sect. 14.2]: 

Elasticity represented the most promising line of inquiry, not only 
because of its extraordinary practical usefulness and the accuracy of 
the theoretical synthesis that it permitted, but also because of the 
implications of its general principles and equations. The molecular 
interpretation of elastic behavior that Navier, Cauchy and Poisson 
promoted led many scientists to attempt Anally to unify and explain 
all forces operating in Nature in the light of a universal law of at- 
traction and interatomic repulsion, like that foreseen by Boscovich. 

[. . . ] From this perspective, the laws of elasticity are by no means re- 
stricted to a speciflc class of bodies, but express an inherent property 
of matter itself. 

Lame shared that same perspective. Indeed, his deflnition of elasticity 
concludes along very similar lines (third paragraph of Chap. 1, Sect. 1): 

L’elasticite est done une des proprietes generales de la matiere. 

Elle est, en effet, Torigine reelle ou Lintermediaire indispensable des 
phenomenes physiques les plus import ants de I’univers. C’est par elle 
que la lumiere se repand, que la chaleur rayonne, que le son se forme, 
se propage et se pergoit, que not re corps agit et se deplace, que nos 
machines se men vent, travaillent et se conservent, que nos construc- 
tions, nos instruments echappent a mille causes de destruction. 



1.4 Lame’s View of Ether 

The main focus of Lame’s treatise on elasticity is undoubtedly the study 
of small vibrations and linear waves. Out of one hundred and thirty-four 
sections, only the eight sections of Chaps. 12 and 16 — making up twenty- 
eight pages in all — are devoted to equilibrium problems of three-dimensional 
elasticity. The notion of wave speed, with emphasis on the classifleation into 
longitudinal and transversal waves, is introduced in Chap. 11. From Chap. 17 
on — one hundred and eleven pages in all — Lame strives to explain light waves 
through elasticity theory, starting from Fresnel’s birefringence. 

This endeavour leads him to the following conclusion: the phenomena of 
light propagation in space, diffraction, and birefringence prove the ubiquitous 
existence of ether beyond all conceivable doubt. While crediting the mathe- 
matical theory of elasticity for this important and rigorous result. Lame was 
confident that accounting properly for the interaction between ethereal and 
ponderable matter would have disclosed the secrets of a host of mysterious 
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and incomprehensible beings, ranging from caloric, electricity, magnetism, 
universal attraction, cohesion, to chemical affinities. 

2 Maxwell’s Electromagnetism vs. Lame’s Elasticity 

Lame was wrong with his elastic ether. As is clear to us now. Maxwell was 
right with his electromagnetic field theory, shaped on Faraday’s unorthodox 
ideas. This makes me appreciate Maxwell’s penchant for understatement and 
the “absurd and infuriating modesty” which Dyson reproached him with in 
a witty and enlightening short essay [3]. But was Maxwell against ether? 

2.1 Maxwell on Ether 

By no means. Consider the passing mention of his own theory that Maxwell 
uttered in the presidential address he gave at the annual meeting of the 
British Association for the Advancement of Science in 1870: 

Another theory of electricity which I prefer denies action at a distance 
and attributes electric action to tensions and pressures in an all- 
pervading medium, these stresses being the same in kind with those 
familiar to engineers, and the medium being identical with that in 
which light is supposed to be propagated. 

However, despite the kinship of “stresses” vaguely advocated by Maxwell, his 
electromagnetic ether deeply differs from Lame’s. 



2.2 The Key Difference between Maxwell’s and Lame’s Theories 

As we have just seen, this is not in being for or against the existence of 
ether (c/. the entry “Ether” in the ninth edition of the Encyclopaedia Britan- 
nica, written by Maxwell). Also their different invariance properties are not 
as discriminating as is commonly adduced. After all, “classical” (z.e., non- 
relativistic and non-quantistic) field theories of mechanics behave well under 
the action of slow^ Lorentz changes in observer — which is all to be expected. 

The distinguished feature that makes the real difference between Lame’s 
ether and Maxwell-Faraday’s ether is topological in nature. In electromag- 
netic field theory there are plenty of rea/-valued physical quantities asso- 
ciated with geometric objects (cells) of various dimensions, embedded in 
four-dimensional space-time (c/. the discussion by Tonti in [8, Sect. 5.1]). 
On the contrary, in continuum mechanics there is only one bona fide real- 
valued quantity, namely work^ associated with the cells of highest dimen- 
sion in the body-time manifold and their boundaries (beware: body-time, not 

^ After my talk, Alain Bossavit assured me that this notion could be made precise 
using well-established mathematical tools [4,5]. 
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space-time!). Mechanical work is backed up by a team of vector-valued (and 
covector-valued) physical quantities, associated with cells of different dimen- 
sions. In electromagnetics, on the contrary, there are no bona fide vector 
quantities (distrust appearances!). 



Space- time cells. I espouse the approach pionereed by Tonti [6-8] and ex- 
pounded by Mattiussi [9,10], among others. In the following, I consider space- 
time as the product of a 3-dimensional space manifold times a 1-dimensional 
time line, thus adopting the viewpoint of an observer. Space-time vectors de- 
compose accordingly into space and time components. A space vector is, by 
definition, a space-time vector with null time component (and viceversa). The 
dichotomy (space, time) is invariant under slow Lorentz changes in observer. 
It should not be confused with the Minkowskian trichotomy (spacelike, light- 
like, timelike), which is invariant under a general Lorentz transformation [11]. 
All space vectors are spacelike, and time vectors timelike — but the converse 
does not hold true. 

Consider the hierarchy of parallelepipedal cells in space-time, ranging 
from 0-dimensional (an event, i.e. a place at an instant) to (3-{-l)-dimensional 
(a chunk of space times a time lapse). A nondegenerate k-cell (Le., a k- 
dimensional cell) has k independent edges, which are space-time vectors. More 
precisely, a A:-cell (with A: > 0) is the equivalence class of all A:-parallelepipeds 
sitting on the same 0-cell, lying in the same A:-plane and having the same k- 
volume (as a basic reference for this gadgetry, see [12, Chap. 4]). When talking 
about cell edges, I really mean the edges of at least one of its representative 
parallelepipeds. 

Let me call plumb all nondegenerate cells whose edges are either space or 
time vectors; slant the other ones. Most cells are slant, but all of them can be 
obtained as linear combinations of plumb cells. Among plumb cells, I single 
out time-dipped cells^ which have one time edge (they cannot have more); the 
remaining ones, having no time edges, I call simply space cells. At the two 
extremes of the hierarchy, all nondegenerate cells are plumb: at the bottom 
{k = 0) all of them are space cells; at the top (A: ==3-1-1) all nondegenerate cells 
are time-dipped. In between, most cells are slant; however, they decompose 
(in a unique way) into space and time-dipped components. 

The distribution (in space-time) of a real- valued quantity associated with 
A:-dimensional cells is properly gauged by a (real- valued) k-form^ which is — 
by construction — the integrand that makes sense to integrate on a A:- cell, 
yielding the amount of the gauged quantity contained in that cell. The co- 
space and co-time components^ of a A:-form are singled out by integrating it 
on space and time-dipped A:-cells, respectively. 



The basic structure of electromagnetic field theory. Maxwell’s play — 
when staged in this transcription — has two leading characters: the electromag- 



^ To be read space-conjugate and time-conjugate components, respectively. 
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netic 2-form and the charge- current 3- form. Maxwell’s equations establish 
the existence of their companion potentials: the electromagnetic potential (a 
1-form) and the charge- current potential (a 2-form). To sum up, we have a 
nice symmetric panoply, centered on dimension 2: one 1-form, two 2-forms, 
and one 3-form. All of them are real- valued, none of them sit at the extremes 
(0 and 4). 

Down-to-earth electric and magnetic quantities are elicited by evaluating 
the co-space and co-time components of the above Ar-forms: the scalar- valued 
charge density p and the vector-valued current density J represent respec- 
tively the co-space and co-time components of the charge-current 3-form; the 
vector-valued electric flux density D and magnetic field intensity H repre- 
sent respectively the co-space and co-time components of the charge-current 
potential 2-form; the vector-valued electric field intensity E and magnetic 
flux density B represent respectively the co-time and co-space components 
of the electromagnetic 2-form; the scalar potential V and the vector poten- 
tial A represent respectively the co-time and co-space components of the 
electromagnetic potential 1-form. 

It should be stressed that all of the above scalar fields and (space) vector 
fields are but proxies of the thing-in-itself. Much structural information is 
obliterated when those pallid substitutes are taken at face value: in fact, 
while y is a true scalar field, p is a pseudoscdldx field (a space density); only 
J and D should be regarded as true vector fields, while A and E are co vector 
fields, B is a pseudoYectoi field, and H a pseudocovector field. 

When referred to space vectors, the dichotomy (true, pseudo-) is syn- 
onymic with (polar, axial). The independent dichotomy (covector, vector) is 
related to the alternative between inner and outer orientation (in this order!). 
These distinctions were well known to Maxwell [13]. However, they never en- 
tered the physics vulgate — or evaporated early on. The electric and magnetic 
fields E, B, et cetera can obviously be extended to any space dimension — 
as bare vector fields. This is a futile exercise, however, since the delicate 
underlying structure can not be exported, as I now formally state. 

Proposition. Take for granted that time is one-dimensional. Then the only 
forms having the same number of strict co-time and strict co- space compo- 
nents are those sitting exactly halfway from the extremes: if space has even 
dimension, there are none; if space has dimension 2A: — 1, A:-forms will do. 

Proof. Trivial: a A:-form in n dimensions has n\j{n — k)\ k\ strict components. 

Proposition. Assume that space has dimension 2A: — 1 . Then A:-forms have 
2A; — 1 strict co-time (and co-space) components if and only if A: = 1,2 . 

Proof. Check that only A: = l and k = 2 solve (2A:— 1)1/ (A: — 1)1 A:! = 2A: — 1. 

Proposition. A nontrivial electromagnetic theory requires A: > 1 . 
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Proof. If space is one-dimensional, electricity and magnetism do not couple, 
electromagnetic waves do not exist, the Poynting vector field S == E x H 
(another true vector field, like J and D) is null. 

In conclusion. Maxwell’s play can only be staged on a (3-h l)-dimensional 
space-time. 



The basic structure of mechanical field theory. Let me now transcribe 
Lame’s play into the same unifying language. The first major change in the 
stage setting is that mechanical quantities are associated with cells embed- 
ded in an (n-l-l)-dimensional body-time manifold, defined as the product of 
an n-dimensional body manifold ( 0 < n < 3 ) times a 1-dimensional time line. 
It should be noted that introducing a body manifold does not privilege any 
observer. Singling out a time line from the space-time continuum does call 
for an observer, however.^ The cases n = l,2 describe corporeal curves and 
surfaces respectively. Following Lame, I will concentrate here on the mechan- 
ics of space-filling bodies (n = 3). Even in this case it is of the essence to keep 
body quite distinct from space. 

Body-time cells are isomorphic to space-time cells, so only terminology 
needs adapting. In particular, space cells translate into body cells. Not to 
confuse the issue, I distinguish carefully between elements of the space mani- 
fold — which I call places — and elements of the body manifold — which I call 
points. A 0-cell is now a point at an instant, a body 1-cell a line element^ 
a body (n — l)-cell a facets and a body n-cell a bulk element. At the top of 
the cell hierarchy sit body-time lumps (bulk elements times time lapses). 
Distributions of physical quantities associated with A:- dimensional body-time 
cells are gauged by A:-forms, endowed with co-body and co-time components. 

The important difference with respect to electromagnetic field theory 
is that most mechanical quantities — and hence the corresponding forms — 
are not real- valued. The notion of vector- valued forms surfaces in a passing 
remark — entitled “A glimpse of other physical theories” — in [9]. Vector- and 
covector- valued forms are explicitly introduced in [14,15], where their use is 
rightly advocated for the evaluation of electromagnetic forces. The way these 
papers treat mechanics is, however, far from satisfactory. 

The fundamental mechanical descriptor is the placement 0-form p, a 
p/ace-valued field attaching to each point at each instant — ie., to each 0-cell — 
a place in space. At variance with electromagnetic forms, the very definition 
of placement calls for an observer: a different observer sees the corresponding 
placement p, as decreed by the action of the group of changes in observer 
on space-time. The restriction of a placement to all simultaneous 0-cells is 
required to be an embedding. The (exterior) differential of the placement 

^ A proper-time line may be attached to each body point independently of any 
observer. But an observer is required to trivialize the proper-time bundle, i.e. to 
identify lines attached to different points with each other (see [11, Sect. 1.4]). 
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0-form is the {space) vector-Ydlued displacement l-/orm, whose co-time and 
co-body components are represented respectively by the velocity p and the 
body gradient of placement Vp. To focus on the essentials, I will stick here 
to an affine space-time, whose tangent bundle has a canonical global con- 
nection, whereby all tangent spaces are trivially identified with each other. 
Handling vector- valued forms requires an easy extension of the rules valid for 
real-valued forms (see [17, Def. 6.3.11]). 

In dynamics, a key role is played by test velocities, vector- valued 0-forms 
{zero should be emphasized^) sharing the physical dimensions of p . No differ- 
ential compatibility is required between a test velocity v and the placement 
p , the equality v = p selecting the one velocity realized along p . 

Work is the chief mechanical quantity which can be properly integrated 
over body-time lumps. It is a rea/- valued quantity, stemming from a duality 
between vector- and covector- valued forms. ^ The basic features of each in- 
dividual dynamical theory are encoded in the structure of this distinguished 
duality. The standard model of continuum mechanics — encompassing, in par- 
ticular, Lame’s theory of elasticity — is founded on the following assumption. 

The (so-called “virtual”) work done on a test velocity v over an (n-fl)- 
dimensional body-time cell is the sum of two contributions: an integral over 
the cell itself, and another over its boundary (which consists of n-hl pairs of 
n-cells: 2n facets x the time edge of the lump, plus its bulk element x the two 
ends of its time edge). The real- valued (n-fl)-form to be integrated over the 
lump is the sum of two exterior products: the vector- valued 0-form v times the 
impulse- supply form — a covector- valued (n-l-l)-form — minus the differential 
of V (a vector- valued 1-form) times the impulse- flux form — a covector- valued 
n-form. The real- valued n-form to be integrated over the cell boundary is the 
exterior product of v times the boundary-impulse form — a covector-valued 
n-form living on the cell boundary. 

An overriding balance principle (see [18-22]) commands that the total 
(“virtual”) work done on any test velocity over any body-time lump should 
be zero, implying that: (i) the impulse-supply form and the exterior derivative 
of the impulse-flux form should add up to the null covector- valued (n-fl)-form; 
(ii) the boundary-impulse form on any n-cell should match with the impulse- 
flux form (a body-time version of the celebrated Cauchy stress theorem). 

In comparison with Maxwell-Faraday’s gossamer edifice, continuum me- 
chanics has much more robustness than fineness in it: granted that time is 
one-dimensional, its basic structure can easily accommodate for any space 

^ My interpretation is that a test velocity attaches to each 0-cell {b}x{i} ( b a point, 
i an instant) the difference Pe(b,i) — p(b,i) between the place assigned to it by a 
juxtaposed placement pe, for vanishingly small e, and that assigned by p, it being 
intended that lime-^o Pe — P {cf [18]). In other words, test displacements develop 
in an extra subsidiary time dimension, parameterized by the pseudo-time e . 

^ Playing with the (real-valued) time variable, other real-valued work-related an- 
cillary quantities may be introduced, as the time rate of work, namely the power 
(or working), or its time integral, namely the action. 
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dimension. Also body dimension may be chosen freely — provided it does not 
exceed space dimension. 

3 Electromagnetic-Mechanical Coupling 

On the surface, coupled electromagnetic-mechanical problems are mostly 
taken as issues in computer-aided design — of a quite difficult nature, admit- 
tedly.^ This is no surprise, thanks to the lasting division between Maxwell’s 
and Lame’s disciples, further aggravated in their pet computer codes, labo- 
riously developed by niche experts. 

However, the heart of the matter lies much deeper. First of all, lumping 
electromagnetic and mechanical effects into distinct components of the same 
device is not always possible. Therefore, loose coupling between pre-existent 
specialized simulators may not suffice. Second, and more important: the elec- 
tromagnetic and the mechanical response of a medium cannot in general be 
characterized independently of each other. Therefore, a unified understanding 
of both disciplines cannot be dispensed with — at least in principle. 

3.1 A Sound Basis for the Electrodynamics of Deformable Media 

'^[T]he issue of force densities in material media is the most controversial, 
the least investigated, and the least understood topic of classical electromag- 
netism.” This is how it is put by the editor of the Academic Press Electro- 
magnetism series, Isaak Mayergoyz, in his foreword to the recent book [25] 
by the late Scipione Bobbio. 

In this book Bobbio details several specific materials. In all cases, the key 
point is that sound constitutive assumptions for the free-energy density — 
ascribed to matter — and for the electromagnetic energy — ascribed to ether^ — 
can not be laid down independently of each other. This point is touched 
upon also in [16, Sect. 1], where Henrotte and Hameyer write: “In order to 
define specifically electromagnetic forces, we have to use electromagnetic en- 
ergy functionals instead of total energy functionals.” Then they admit that 
defining “such restricted functionals” is not at all obvious, e,g., for a magne- 
tostrictive material. 

In the case of an electrically linear fluid dielectric, Bobbio shows that 
Helmholtz’s and Kelvin’s formulas for the electric force density — which noto- 
riously disagree — correspond to a different splitting of the same total energy 
density into ether- and matter-related terms — hence, to different prescrip- 
tions for the mechanical force density, i.e., for pressure. A long-lasting — and 
quite idle — controversy between supporters of either formula is dispelled this 
way. See [25, Sect. 4.7] for a review of the experimental side of the dispute. 

® Such difficulties were vividly reported at the workshop. See in particular [23,24]. 
^ This is my own wording, inspired also by a recent mind-tecising contribution by 
Ericksen [26]. Bobbio uses the neuter term “field”. 
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The basic structure underlying Maxwell’s stress tensor. The Maxwell 
stress tensor represents the fundamental coupling between electromagnetism 
and mechanics. I need not belabour this point, which is so neatly expounded 
by Henrotte and Hameyer [16]. I agree with them and dissent from [22, 
Sect. 4.2]. 

Let us consider a placement p, the place- valued field defined in Sect. 2.2. 
It is now convenient to associate with p the mapping (b,i) (p(b,i),i), 

which embeds body-time into space-time ( b is a point, i an instant). By a 
slight abuse of language, I will denote also this trivial extension by p . Let 
c be a body-time A:-cell, and p a spacetime-conjugate A:-form (Fig. 1). Of 
course, p cannot be integrated on c, but it makes sense to integrate it on its 
push-forward by p , namely p^c . By definition, the same result is obtained 
integrating on c the pull-back of p by p , namely p^p : 

{p*(p,c) = , 

where integration is aptly denoted as a duality between Ar-forms and A:-cells. 
Given c, the value of the integral depends on both p and p, which may 
be varied independently of each other. In particular, a test dispacement (c/. 
footnote 4 on page 8) may be added to the placement p and the spacetime- 
conjugate form p changed, so as to keep its body-time pull-back p~^p fixed. 
This kind of operation is central to a proper definition — and an efficient 
computation — of Maxwell’s stress tensor and related force densities, as clearly 
pointed out by Henrotte and Hameyer [16] (c/. my Fig. 1 with their Fig. 1). 




Fig. 1. Placement-related cells and forms 



What is to be done? In the closing paragraph of [3], Dyson praises Maxwell 
as the forerunner of mainstream twentieth-century physics: 
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The ultimate importance of the Maxwell theory is far greater than 
its immediate achievement in explaining and unifying the phenomena 
of electricity and magnetism. Its ultimate importance is to be the 
prototype of all the great triumphs of twentieth-century physics. 

My personal view is that Maxwell’s theory, while paving the way to quan- 
tum mechanics, interacted early on with the true Newtonian chapters of 
classical mechanics, ie., the classical theories of inertia and gravitation. This 
conceptual integration produced the special and general theories of relativity, 
well within the first two decades of the twentieth century. On the contrary, 
the integration of Maxwell’s electromagnetic theory with the main body of 
classical mechanics, z.e., the mechanics of deformable continuous media,^ still 
remains to be done, nearly a century later. 

Should we succeed in this endeavour, we could ascribe our accomplish- 
ment exclusively to the ^‘Geometers” of our own century — as justly (or as 
unjustly) as Lame did with his Physique mathematique, proprement dite. 
While skeptical with regard to our ability to counter parochialism in science 
and education, I am quite confident that the present strong trend towards 
miniaturization — down to the nano-scale — could make a unified electro-me- 
chanics of deformable media one of the most useful physical theories in the 
near future. 
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Abstract. In this article, we describe two types of conservative variational tech- 
niques that aim at improving the use of FDTD methods for the treatment of com- 
plex geometries with time dependent Maxwell’s equations. 



1 Introduction 



Although very old, the finite differences time domain methods (FDTD in the 
electromagnetic literature) remain very popular and are widely used for time 
dependent numerical simulations of electromagnetic wave propagation. These 
methods allow us to get discrete equations whose unknowns are generally 
field values at points of a regular mesh with spatial step h and time step At. 
For Maxwell’s equations, the Yee scheme [12], [13] is a prototype of such a 
scheme. In ID, it concerns the following system: 



du dv ^ dv du ^ , r. 

Without any mesh refinement, the equations of the scheme are 
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where the discrete unknowns are evaluated on a staggered uniform grid. There 
are several reasons that explain the success of Yee type schemes, among which 
are their easy implementation, their efficiency linked to the use of a uniform 
regular grid together with an explicit time discretization, and the fact that a 
lot of properties of continuous Maxwell’s equations (energy conservation, free 
divergence property,...) are respected at the discrete level. The stability and 
accuracy properties of such a scheme are well known. As a consequence of its 
explicit nature, the scheme is stable under the C.F.L. condition (c denotes 
the propagation velocity and d the space dimension) 



cAt y/d 
h ~ d 



(3) 



This implies that the time step cannot be too large but this is not restrictive 
in practice since a sufficient accuracy requires a small time step. On the other 
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hand, it must not be too small either because, as is well known, the numerical 
dispersion, roughly speaking the error committed on the propagation velocity 
of waves, increases when the ratio cAt/h decreases. 

The counterpart of the nice properties of FDTD schemes is a lack of “ge- 
ometrical flexibility” which makes the use of such flnite difference schemes 
not obvious in the case of computational domains of complicated shape (con- 
sider here the diffraction of electromagnetic waves by an obstacle as a target 
problem). It may also be difficult (at least with a theoretical guaranty of 
stability) to treat boundary conditions and variable coefficients or to be able 
to do mesh refinement. To overcome such difficulties there exist at least two 
attractive solutions: 

(i) The variational methods, in particular the finite element methods. 

(ii) The finite integration technique [4] and the finite volume methods [11]. 

These are “natural” extensions in the sense that, for instance, the Yee scheme 
can be interpreted as a particular mixed finite element method or a particular 
finite volume method on a uniform grid. My objective in this paper will be to 
review briefly two recent works that aim as making the treatment of complex 
geometries with FDTD schemes possible while preserving the nice properties 
of these methods: 

— The data of the problem remain (mostly) structured, 

— The time discretization remains (essentially) explicit, 

— The stability condition is not affected by the geometry of the domain. 

2 Fictitious domain methods 

Let us consider the model problem of the scattering of an incident electro- 
magnetic wave by a perfectly conducting obstacle. The idea of the method 
consists in artificially extending the solution inside the obstacle - which makes 
the use of a 3D uniform regular grid for the electromagnetic field possible - 
and to introduce at the same time a conforming surface mesh for the bound- 
ary of the obstacle to handle the boundary condition. On this mesh, one 
computes an auxiliary unknown that can be interpreted as a Lagrange multi- 
plier associated to the boundary condition and coincides, in this case, to the 
surface electric current. The challenge is then to let these two “independent” 
meshes communicate in a clever way. This can be done through the use of a 
mixed variational formulation in which the boundary condition is taken into 
account in a weak sense. The stability of the method in ensured through a 
discrete energy conservation and the stability condition is the one of the pure 
FDTD scheme. The only additional computational cost (with respect to the 
standard FDTD scheme) is reduced to the boundary mesh: a sparse positive 
definite linear system has to be solved at each time step. 
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2.1 The fictitious domain formulation 



Our model problem is the scattering of an electromagnetic wave (in a homo- 
geneous medium with e = /x = 1) by an obstacle (9, with exterior D. We 
impose the perfectly conducting boundary condition on 7 = dD = dO: 



< 



de 

m 

dh 



— rot h = 0, 
-h rot e == 0, 



X e D, 

X G 



n X (e X n) = 0, on 7 = dD. 



(4) 



We assume that the incident wave is generated by initial conditions (omitted 
here) at time t = 0. In order to have a finite computational domain, the clas- 
sical technique consists in bounding the domain D and in imposing absorbing 
conditions on the exterior boundary ( [12], [6]). For the sake of simplicity, a 
perfectly conducting condition is assumed on the exterior boundary as well, 
and, for our purpose, we choose the geometry of the external boundary (which 
does not interest us here) to be rectangular or parallelepipedic. We denote 
by f} this bounded domain and the box i? (J O by C . 




Fig. 1. The geometry of the problem. 



The main idea of the FDM is to extend the electromagnetic field (e, h) 
from i? to the enlarged domain C by 0 inside O. Introducing as an additional 
boundary unknown the electric current: 



j =z h X n 



I7 5 



(5) 



it is easy to see that the extended unknowns (e, h) satisfy in the sense of 
distributions: 



^ -roth = jS^, in C, 



dh 



-h rot e = 0, in C, 
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[ n X (e X n) = 0, on 9(7, 
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where 5-^ is the Dirac measure supported by 7. To get the variational fic- 
titious domains formulation, we multiply the first equation of (6) by a test 
field et (which only depends on the space variable x - the subscript t here 
holds for “test functions” and has no relationship with the time variable) 
and integrate over Q. Similarly, we multiply the second equation of (6) by a 
test field hi^ integrate over i? apply an integration by parts. One completes 
the formulation by writing the boundary condition in the weak sense. The 
equation is multiplied by a surface test field and one integrates over 7. The 
resulting formulation is the following: 




{u^ut) = / u-ut dx^ 6(e, h) — I /i-rot e dx, c{j,e) = / j-TTrC d'y^{S) 
JQ JQ J'Y 

where tt-^ denotes the trace map = n x (e x n)|^. The appropriate 
functional spaces are: 

( U = Ho{rot,f2) = {tt G H rot u G x {u x n) = 0 on df2} 



[ V = L — H|| ^^^(div^,7), (see [3] for a precise definition). 

One of the main differences between this approach and a standard conform- 
ing finite element approach lies in the fact that the boundary condition is 
taken into account in a weak sense instead of being imposed in the functional 
space. In the formulation, there is no mention to the geometry of the problem, 
namely the boundary 7, in the functional spaces for the volume unkonwns e 
and h. The geometry only appears in c(-, •) and L. We can also interpret the 
unknown j as a source term distributed on 7. If this source term were known, 
we simply would have to solve the wave equation is a square with a right hand 
side: the FDTD approach makes sense in such a geometric situation. In fact, 
j is unknown and becomes a control variable in order to make e satisfy the 
boundary condition on 7. The above approach is referred to as the fictitious 
domain method (FDM). 



There is an analogy between the FDM and the integral equations for scatter- 
ing problems [1]. Indeed, in this kind of method j typically is the quantity 
that is chosen as the unknown. Nevertheless let us point out a very important 
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difference between our approach and these methods. Integral equations are 
known to lead, after discretization, to the solution of full linear systems in 
as will be shown later, this will not be the case for the FDM. 



2.2 Finite element approximation and time discretization 

Space discretization. One obtains the semi-discrete (in space) scheme by re- 
placing in (7) the functional spaces f/, V and L by appropriate finite dimen- 
sional spaces Uh, Vh and L/f defined as follows. We consider a uniform mesh 
Th made of equal cubes K e Th of side h. The appropriate approximation 
spaces Uh and Vh are: 

( Uh = { Uh ^ U \\y K e Th,Uh\K ^ Qo,i,i X Qi,o,i X Qi,i,o } 

{ (9) 

[ G i7(div, f2) \\y K eTh, vhIk e Qi,o,o x Qo,i,o x Qo,o,i ) 

where i7(div, i?) == {tx G || div u E and where Qpi,p 2 ,ps is 

the set of polynomials of three variables whose degree with respect to the i^^ 
variable is less than or equal top^. In addition, one uses adapted quadrature 
formulas for approximating the various integrals appearing in (8). 

One can notice that, contrary to what one could expect, we do not use a 
space of completely discontinuous elements to approximate V = we 

use vector fields whose normal component is continuous across each face of 
the mesh. The spaces defined by (9) are known as edge elements for the elec- 
tric field and face elements for the magnetic field (see [10]). In particular a 
set of degrees of freedom is given by (see figure 2): 

— For the space Uh (degrees of freedom for the electric field) : the (constant) 
tangential component of the vector field along each edge of the mesh. In 
the sequel, we shall denote by e the vector of degrees of freedom of e e Uh 

— For the space Vh (degrees of freedom for the magnetic field) : the (con- 
stant) tangential component of the vector field on each face of the mesh. 
We denote by h the vector of degrees of freedom of h eVh- 



For the space Lh-, we consider an approximation of the surface 7 made 
of a piecewise linear surface with triangular facets of diameter less than or 
equal to H (see figure 2 for a sphere) and consider the corresponding Raviart- 
Thomas tangential face element space described in [1] for instance. Let us 
simply recall that this space is made of tangential surface vector fields which 
are piecewise linear, have a constant normal component on each edge of 
the mesh. These fluxes constitute of the degrees of freedom for this surface 
unknown (see figure 2). We shall denote j the corresponding vector of degrees 
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elements (right) 



of freedom. The space semi-discretization results into an algebraic-differential 
system of the following type: 



' ^-Bh-C*i = 0, 
at 

< M"^+B*e = 0, 

at 



( 10 ) 



t C e = 0. 

— The two mass matrices and are diagonal. 

— The two matrices B and represent discrete rotational operators. 

— The matrix C represents a discrete tangential trace operator. 



For the time discretization, we introduce a constant time step > 0 and 
apply a standard staggered grid centered scheme: 






r»n+l _ 



At 
At " 



- C* j 



n+ J 



-j-B*e^ = 0 , 



-0, 



[ C = 0 



( 11 ) 



For the practical computations, we remark that the first equation of (11) 
permits to compute explicitly h ^^+2 from and . Furthermore, if j ^+2 

is known, the first equation of (11) allows to compute It remains to 

determine the equation to compute This is where we use the third 

equation of (11) that we combine with the first one to obtain: 

C C* = -C B (12) 



For each time step, the algorithm consists in: 

— compute two steps of explicit scheme, i.e. the first two equations of (11), 
which are nothing but the standard Yee scheme with surface source term. 
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— solve the linear system (12), reduced to the surface unknown . 

We see that, in comparison with a standard FDTD procedure inside (7, the 
only additional cost is the inversion of the matrix Q = C C* . This 

cost is marginal, due to the properties of Q (see section 2.3). In fact, from 
the computational point of view, the difficult step in the implementation lies 
in the construction of the matrix B (and then of Q): in 3D, it implies the 
determination of the intersections between the cubic mesh and the surfacic 
mesh. See [7], for instance, for more details. 

2.3 Theoretical issues 

Existence of the discrete solution. One easily observes observe that 

— Q is symmetric and positive (by construction !), 

— Q is a “small” matrix (since “reduced to the surface”), 

— Q is a sparse matrix with narrow bandwidth. 

The (crucial) remaining question is the definiteness of Q which corresponds 
to the fact that the kernel of the matrix is equal to 0, or equivalently that B 
is surjective from Uh onto Lj/. This suggests that the space Lh must not be 
too large, or in other words that one must not impose too many “boundary” 
constraints to the discrete solution. In fact, as for any mixed method, there is 
a compatibility condition with the two spaces Uh and Lh that can be reduced 
to a compatibility relation between the two meshes of C and 7 : the volume 
mesh cannot be too large with respect to the boundary mesh or, roughly 
speaking, the ratio H/h must be large enough. In this sense, the two meshes 
cannot be completely independent [5], [7]. 

Stability analysis. The numerical scheme is stable under the same CFL con- 
dition than without the obstacle. This is a consequence of the conservation of 
the following discrete electromagnetic energy (the proof is straightforward): 

= + (13) 

Finally, one can prove that this quantity is a “true” energy, namely a positive 
quantity that constitutes a norm of the discrete solution, if (3) holds. 

Convergence analysis. For simplicity, let us restrict ourselves to the case 
of the semi-discrete problem (the treatment of the time discretization does 
not pose any specific difficulty). The convergence of the method requires not 
only the invertibility of the matrix Q but also a uniform bound (uniform with 
respect to the approximation parameters h and H) of an appropriate norm 
of its inverse Q~^. The general theory ( [2]) says that this is equivalent to 
verifying the so called uniform inf-sup condition whose verification leads to a 
compatibility condition between the sizes of the two meshes namely (see [8] 
or [9] for instance) H >Ch for some (7 > 0. It is difficult to get an explicit 
value for C but in practice it suffices to take C slightly greater than I. 
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Accuracy of the fictitious domain method. The counterpart to the good prop- 
erties of the fictitious domain method, in terms of simplicity and robustness, 
is its limited accuracy. The limitation is due to the fact that the regularity of 
the exact solution in C - or inside an element K of the volume mesh - is lim- 
ited, independently of the smoothness of the data of the problem, since the 
tangential component of h presents a jump across 7 . As a consequence, the 
method is only of order 1 , i.e. the error is bounded by a constant times 
However, numerical experiments (see [5] for instance) demonstrate that the 
fictitious domain method provides a better accuracy than the FDTD method 
combined a staircase approximation of the boundary of the obstacle. In this 
last case, the approximation of the geometry induces artificial singularities in 
the surface of the scatterer: these singularities provoke spurious diffractions 
that are not present with the fictitious domain method. 

3 Space-time mesh refinement and domain 
decomposition 





Fig. 3. 2 D slice of i?/ and i?c (left) - 3D view of the refinement (right) 



An alternative approach (that can be combined with the fictitious domain 
method) consists in refining the mesh at the neighborhood of the obstacle. I 
will present below some recent research about conservative space-time mesh 
refinement methods. When one works with regular grids, the transition be- 
tween a coarse and a fine grid is necessarily “non conforming”. Moreover, 
for efficiency and accuracy considerations, one would like to use a local time 
step in order to keep the ratio time step/space step constant. Interpolation 
methods that can be traditionally found in the literature can lead to non stan- 
dard instability phenomena. Here, we shall propose two alternative methods 
based on the reformulation of the problem as an artificial domain decom- 
position problem. The key issue of these methods is that their stability is 
guaranteed from the theoretical point of view through the conservation of an 
appropriate discrete energy. The first method involves the introduction of a 
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Lagrange multiplier on the interface coarse grid / fine grid (as the so-called 
mortar element method), the second does not. As for the fictitious domain 
method, both methods lead to the inversion of a small sparse positive definite 
linear system on the interface. We shall also show how spurious numerical 
phenomena due to a change of grid can be analyzed and controlled. 



3.1 The domain decomposition approch 

As a model problem, we consider Maxwell’s equations in the whole space IR^ 
(with e — jji — 1): 



^ de , ^ 

^-rolh = 0, 

dh 

^_ + „te = 0, 



a: € IR^ 
a; € IR^ 



(14) 



Our goal is to solve numerically this problem by domain decomposition us- 
ing locally a two times finer grid than in the rest of the domain. To be more 
precise, we consider the fine grid domain i?/ with boundary E (with outward 
and unit normal vector n), the coarse grid domain i?c denoting the exterior 
of i?/ . The domain i?/ is the one that we shall discretize with a fine grid 
of stepsize h and i?c is the one that we shall discretize with a coarse grid of 
step 2h (see figure 3). In what follows, {Cc^hc) (resp. (e/,h/)) will denote 
the restriction to i?c (resp. i?/) of (e,h). 



Saying that (e,/i) is solution of (14) is equivalent to saying that (ec, he) 
and {cf hf) are solutions of the same equations (14) respectively in f?c and 
i?/ and are coupled by the following transmission conditions (the continuity 
of the tangential traces of the two fields across E): 



n X (cc X n) = n X {cf X n) on 17, 
he X n — hf X n on E. 



(15) 



3.2 A variational formulation with interface unknown 

This formulation is similar to the fictitious domain formulation we presented 
in section 2. We introduce as an additional unknown, namely the common 
tangential trace of the magnetic fields he et hf on the interface: 

J = {hf X n)\s = (he X n)|r. (16) 

Note that j is nothing else but the surface electric current on E. We can then 
reformulate our problem as follows. Assuming for a moment that j is known. 
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{Cc^hc) and are the respective solutions of the following decoupled 

problems, in which j appears as a (boundary) source term: 



^ - rot hf = 0, 
Ot 


X G 1?/ 


(i) 




dhf 

+ rot Cf = 0, 


X £ Of 


(ii) 


(17) 


hf A n = J, X E S 


= dOf 






dec , u a 

— rot he = 0, 

dt 


X G i?c 


(i) 




dhc 

+ rot 6c = 0, 
dt 


X G i?c 


(ii) 


(18) 


he An = J, X E S 


== dOc 







For a given j, by construction, the continuity of the tangential magnetic field 
is ensured. The idea of the method is to consider j as a control variable in 
order to ensure the continuity of the tangential electric field 

n A (e/ An)\^ — n A {Cc A n) \jj. (19) 

To derive our variational formulation, we use in each domain the same mixed 
formulation as in section 2.1: once again, only the two equations involving the 
rotational of the magnetic field (cf. (17)-(ii) and (18)-(ii)) are integrated by 
parts: this makes appear j in the boundary term and leads to the following 
abstract formulation: 



Find {ec,hc,ef,hf,j) :IR'^ — > Uc x Vc x U f x Vf x L such that 
d 



{Scj ^C,t)c h-c) T ^C,t) — 0? ^C,t ^ Uq-, 

{he-) hc^t^c &c(^c5 hc,t) — 9? he I G V^, 

hf) + c/(j, €f^t) = 0, V ef^t e Uf, 



(hf) hf^t)f T ^/(^/? hf^t) — O7 



I Cc{3t)ec)^Cf{j^,ef), 

The appropriate functional spaces are: 

UeJ = 77(rOt, f?e,/), VeJ = L^{f2e,f)^ 



^hf^teVf, 
Vjt € L. 



(20) 



( 21 ) 
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The various bilinear forms appearing in (20) are defined by: 

\{Uc,Vc)c= / Uc-Vcdx, (uf,Vf)f = / UfVfdx, 

bc{ec,hc)— / he ‘TotCcdx, bf{ef,hf)= / hf ^ rot efdx, (22) 
JQc Jf2f ^ ^ 

|^Cc(j,ec) = j j-TTrec da, Cf{j,ef) = j j-Wre/da. 

where tt^- denotes again the trace map: = n A {u A n)|y>. 



3.3 Finite element approximation. 




Fig. 4. The degrees of freedom: the electric field in blue, the magnetic field in red, 
the electric current in green 



As in section 2.2, the space discretization of (20) is obtained by replacing the 
spaces Uc^Vc,Uf,Vf and L by finite dimensional approximation subspaces 
Uc,h,Vc^h,Uf^h,Vf^h and Lh C L . The spaces Uf^h and Vf^h ( respectively 
Uc^h and Vc^h) are constructed as the spaces Uu and Vh in section 2.2, using 
a regular cubic mesh of stepsize in 12/ (respectively of stepsize 2/i in i?^). 
For the space L/j, we have chosen as the mesh Th{^) of the interface 17, the 
trace of the coarse grid mesh in f2c, i.e. a regular mesh made of squares C 
of side 2/i. Then, we use standard (tangential) Raviart-Thomas 2D elements, 
which provides a conforming subspace of L: 

Lh = {i/i G 1/ II V iC G Th{LJ),jh\K ^ Qi,o X Qi,o } (23) 

The degrees of freedom in Jh are the fluxes (or equivalently the constant 
normal component) of the vector fields across (or on) each edge of the surface 
mesh. The global spacial discretization is summarized in figure 4. The semi- 
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discrete problem can be rewritten as an algebraic-differential system (we use 
the same type of notation as in section 2.2): 



^ — b* 



rlf^ 

Mf he+C,*j = 0, 



(24) 



3.4 Time discretization. 



We look for a time discretization scheme of (24) that will use a time step At 
in i?/ and 2 At in Qg. More precisely, in i?/, the discrete unknowns will be 

^//i’ while in the discrete unknowns will be . The two 

computational grids only meet at “even instants” t = t^^ at which the two 
electric fields (in i?c and i?/) will be computed simultaneously. 



Concerning the interface unknown, one will choose to discretize jh with the 
coarse time step (which is coherent with the choice of the coarse mesh for 
the interface mesh): in other words j is considered constant, equal to 
during the time interval [t^^ As a consequence, the scheme that we 

shall apply to the first four equations of (24) is the following: 





2?T,-t- 1 


^2n 










At 


Bf - 


C* j2n+l ^ 


Mf 




, 2n— 

— 


1 

2 






f 


/ 

At 


-+B} 


ef = 


= 0, 




^2n+2 


_2n+l 








Mf 


At 


-Bfh 


2n+ ^ 
■/ 


- C*f = 0^ 




h2»+| 


- 


1 

2 








f 


/ 

At 


- + B} 


ef +1 = 0, 



(25) 



M 



E 



^2n+2 



^2n 



M 



H 



2At 

h2n+l _ h2n- 



B, 



h2n+l ^ ■2n+l 



0 , 



(26) 



+ B: e2" = 0, 



Solving these equations amounts to solving the Yee scheme in both the fine 
and coarse grids with an artificial source term located on the interface 

27. To determine this source term, we have to say how the last equation of 
(24), namely the continuity of the tangent electric field, will be discretized 
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in time. This is the key point. This equation is taken into account in a (non 
trivial) mean sense: 

2n+2 , 2n + 2e^”+^ + e!” 

Co ^ 2 ^ -Cf^ f ^=0. (27) 

The derivation of this equation is in fact driven, by the fact that, in order to 
ensure the stability of the scheme, one wants to guaranty the conservation 
of some discrete energy. Proceeding as in section 2.3, formula (13), we can 
define a fine grid energy at times and a coarse grid energy at times 
^ 2 n total energy is only defined at the even instants: 



j^2n J^2n J^2n 

f ^ 



(28) 



If is not difficult to deduce from equations (25) and (26) the identity: 



( ^2n+2 _ 



e2"+2 + 2ey+^ + ef 
C} ■ (-^ { ^) 



Ci 



2n+l 



^2n+2 



+ e; 



2n 



(29) 



Using the discrete transmission equation (27), one deduces that the total 
energy is a constant quantity. The stability results from the fact that this 
total energy is a norm under the strict CFL condition (namely (3) with strict 
inequality). For the computations, we remark that, assuming that all the 
unknowns have been computed up to time it is easy to see that, once 
j2n+i jg known, equations (25) - (26) allow us to explicitly compute 



2n+| 2n+l 



e2n+2^h^+i and 



^2n+2 



To be complete, we must find an equation that permits to compute 
After some manipulations, one can show that is the solution of the 

following linear system is a known quantity from previous time steps): 

(Cc C* + CfQ{At) q) j2"+i = (30) 

where the matrix Q{At) is defined by: 

Q{At) = ^ (Mf)-' Bf (Mf)-' 8} {Mfy^ . (31) 



Finally solving our scheme during each interval consists in: 

— a succession of explicit steps if each grid separately, 

— the resolution of the linear system (30). 
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The existence and uniqueness of the discrete solution is guaranteed by the 
non-singularity of the matrix of the symmetric system (30). It is easy to see 
that this matrix is positive definite as soon as the matrix Q{At) is positive 
definite. This last property is equivalent to the matrix inequality: 

which is nothing but the (strict) CFL stability condition in the fine grid. 
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Abstract. RF circuits and systems are gaining importance because we are moving 
further into a society where information is very important and should be available 
any time and anywhere. In this paper we give an overview of RF circuit simula- 
tion with an emphasis on noise simulation which is important functionality for RF 
designers. Due to the high frequency signals, the standard circuit formulation us- 
ing Kirchhoff and lumped elements is not sufficient anymore to accurately predict 
the behaviour of a design and Maxwell’s equations should be used. We give sev- 
eral approximations of Maxwell’s equations and scenarios how the results can be 
incorporated in RF circuit simulation. 



1 Introduction 

High frequency applications are becoming increasingly important. This is 
caused by the fact that we are moving further into the information soci- 
ety where (digital) information is becoming very important. The first con- 
sequence is that large amounts of data should be transported, routed and 
processed at very high speeds. As an example, a switching array for optical 
data transmission may route 20 input signals each at 10 Gbit/s to one or 
many specified outputs. These switching arrays are still mainly implemented 
in silicon (or GaAs) and although we are dealing with digital circuits, the 
high frequency issues in this kind of circuits are analogue in nature and they 
have to be treated and analysed from an analogue viewpoint. 

A second consequence of the increasing importance of information is the no- 
tion that this information should be available any time and anywhere. There- 
fore there is a tremendous increase in wireless networks that allow flexible 
access to a wide variety of information. The increase in functionality of, for 
example, cell phones leads to larger amounts of data to be exchanged. The 
same holds for the emerging in-home wireless digital networks. 

Wireless transmission in general uses high frequency (RF) carriers usually 
in the range of 1-10 GHz. RF circuits are analogue circuits and should be 
treated as such. 

Finally, many high frequency designs are aimed at a consumer market. This 
has severe consequences for the design process: where in the past there was 
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time to build and measure several prototypes, nowadays the demands on 
time-to-market, time-to-quality, price, production volume and production 
yield, etc. are very severe. Furthermore, the increasing complexity and the 
decreasing size of these systems makes measuring extremely difficult and time 
consuming. 

Therefore designers must be provided with analogue design environments that 
help them to design as quickly as possible a working circuit at first silicon. 
At the heart of these environments are the simulators. In order to predict the 
behaviour of a design as accurately as possible they should be provided with 
accurate models, not only of the non-linear components such as transistors 
but also of the physical structures which are part of the design. 

In this paper we will give an overview of RF circuit simulation. First we will 
describe two main characteristics of RF circuits and RF designs. Next we 
will look in more detail at an approach to RF simulation that is becoming 
increasingly popular. This is followed by an explanation on why and how the 
physical implementation is incorporated in RF circuit simulation. 

2 Characteristics of RF circuits 

Clearly the most obvious characteristic of RF circuits is that they work 
at high frequencies. A consequence is that one has to use Maxwell’s equa- 
tions rather than Kirchhoff’s equations and lumped models. Although clearly 
Kirchhoff’s equations can be used without problems up to a certain frequen- 
cies, at RF frequencies they no longer accurately predict the behaviour of 
a design. The physical structures required to implement a design, such as 
tracks on an IC, connectors, etc., begin to play an important role in the to- 
tal behaviour of the design. Since these structures in general do not have 
a regular shape, one has to resort to numerical methods to solve Maxwell’s 
equations and somehow incorporate the results in a simulation. 

A second characteristic is that RF signals have a broad but sparse spectrum 
[10] with a dynamic range of more than 60dB (10^). The weakest signals 
might be almost lost in noise while the strongest signals will introduce all 
kind of spurious intermodulation components due to non-linearities which 
are always present in a circuit. Noise and non-linear distortion translate to 
bit-error-rate in the transmitted data. Consequently, it is important that 
designers can predict the overall noise and distortion quickly and accurately. 

3 RF circuit simulation 

3.1 Overview of RF building blocks and specifications 

Although RF systems can be quite complicated, they are typically built from 
a limited number of building blocks. When discussing RF simulation tech- 
niques it is important to first determine the special properties and charac- 
teristics of each building block and the information that should be obtained 
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in a simulation. The most important blocks are mixers (which perform a 
frequency shift of the input signal, but also add non-linear intermodulation 
products and noise), amplifiers and filters (which distort the signal and add 
noise to it; power amplifiers may be strongly non-linear), dividers (strongly 
non-linear; modify a frequency reference signal), oscillators (generate a sig- 
nal that serves as a frequency reference, often of very high accuracy; are 
autonomous circuits). In all cases non-linearity and noise are important is- 
sues. In the case of oscillators noise present in the circuit manifests itself as 
phase noise. 

In the following section we will explain why and how these properties are 
specified. 



3.2 RF specifications 



In RF circuits the non-linearities are of course no different from non-linearities 
in other analog circuits. However they are usually expressed as intercept 
points like IP2 and IP3. The term ’intercept’ originates from a graphical 
construction which can be used to determine IP2 and IPS. The numbers ’2’ 
and ’3’ refer to the order of the intermodulation product which they de- 
fine. Clearly, these specifications should be determined using a full-nonlinear 
transient-like or harmonic balance-like simulation after which the result can 
be post-processed to yield the IP numbers. 

More interesting is the noise in RF circuits. Noise consists of (usually) small 
unwanted signals in the circuit and originates in the devices that make up 
the circuit. For input/output circuits, a designer is usually interested in how 
much noise will be present or added in a certain, relatively narrow, frequency 
band of interest (FBOI). It will be the task of the simulator to accurately de- 
termine what the final noise spectrum in the FBOI looks like. A special thing 
in RF circuits is that spectra of input and noise signals are shifted around in 
the frequency band due to wanted (e.g. mixers) and unwanted non-linearities 
in combination with large signals. These signals can be internal to the circuit 
or be part of the input signal. Due to this noise folding, the noise in the FBOI 
might originate at completely different frequencies. The simulator must be 
able to handle noise folding. 

Where in input /output circuits a designer is usually interested in the noise 
spectrum in a certain FBOI, when designing oscillators he is interested in the 
phase noise. As mentioned, an oscillator generates a frequency reference in 
the form of a periodic signal. Noise in the circuit causes the output signal to 
become noisy too. It is very important to notice that the noise might change 
the frequency of the output signal. This has severe consequences for simula- 
tion algorithms: noise is usually seen as a small perturbation on the noiseless 
solution which means that we can linearise the circuit. With oscillators this 
is no longer true as will be explained in the next Section. 
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3.3 Basic RF simulation methods 

Now that we have illustrated some important properties of RF circuits and 
signals it can be seen that conventional simulation methods like AC and 
transient simulation are not sufficient for simulating noise in RF circuits. 
AC simulation is fast but does not incorporate non-linearities nor frequency 
folding. Transient analysis could be used to simulate frequency folding but 
this would lead to prohibitively long simulation times if we want to obtain 
accurate information in the frequency domain. This is the reason why special 
RF simulation algorithms were developed. 

The objective of RF circuit simulation is to obtain solutions for the net- 
work variables (the voltages, currents, charges and fluxes) in the circuit and 
to study the effects due to noise. All the newly developed RF simulation 
methods somehow use the Periodic Steady-State solution (PSS) as a start- 
ing point. Conceptually, the PSS solution can be seen as a generalisation of 
the DC solution: where the DC solution describes the voltages and currents 
after infinite time in a circuit containing only DC sources^ the PSS solution 
describes the voltages and currents after infinite time in a circuit containing 
only periodic sources (which include DC sources). As with the DC solution, 
the PSS solution is useful in its own right but it is also used as a basis for 
other analyses (for example, periodic AC and periodic noise). This presents 
a basic two-step approach: 

— Firstly a (noiseless) PSS solution is determined which deals with the 
non-linearities in the circuit [2,3,7,8,10,13,14,17-19], 

— Secondly, a perturbation analysis is done around the PSS solution to 
analyse noise including frequency shifts [1,4-6,16]. 

It is important to notice that the PSS algorithms make a distinction between 
forced and oscillatory problems. In the former the period of the solution is 
known beforehand while in the latter, determining the (exact) period is part 
of the problem. In general the oscillatory problems are more difficult to solve 
[7,12,15]. Also the solution of perturbed oscillatory problems may not be 
periodic at all. 

The noiseless PSS problems are defined as finding a solution x(t) for systems 
of DAEs of the form 

/ + J(^x) = 0 € f ^q(x) + j(x) = 0 e RN 

\ x(0)=x(T) x(0)=x(T) 

where T > 0 is the period of the solution; x contains the node voltages and 
currents through voltage sources and through inductors; q describes the ca- 
pacitor charges as well as fluxes through inductors; j describes currents, or 
voltages differences, as well as effects of sources. The explicit dependence on 
t, in the equations at the left, denotes that periodic sources are present in the 
circuit (forced problems) and therefore the period T is known before hand. 
With oscillators (the autonomous equations at the right) this is not the case 
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and the period T should be solved together with x(^). 

The PSS solution can be obtained in the frequency domain by determin- 
ing its Fourier coefficients by a Harmonic Balance approach. This has the 
advantage that one can easily deal with circuit components that are charac- 
terised in the frequency domain which is not uncommon in RF applications. 
The time domain methods more easily deal with strong non-linearities in 
the circuit and have good convergence properties. Of these methods we men- 
tion Poincare-map based methods, where increase in speed is obtained by 
applying vector acceleration methods such as Minimal Polynomial Extrapo- 
lation. Typical points of attention are: restarting at consistent solutions (i.e. 
satisfying the D AE-manifold) , as well as dealing with multiple oscillation fre- 
quencies. Alternative methods are provided by (multiple) shooting methods, 
or by applying a finite difference method. All methods can be enhanced to 
deal with oscillatory systems in which T is an additional unknown (and a 
gauge equation is added to the system). 

The PSS-solution is useful to, for example, determine the non-linearity of the 
circuit such as IP2, IPS etc. (see Section 3.2). For studying effects due to 
noise, the PSS solution is a first step in RF noise analysis in the two-step 
approach described before. An important effect of noise in RF circuits (see 
also Section 3.2) is noise folding: noise components are moved around the 
frequency band when they interact with other signals in the circuit. We now 
illustrate how a two-step approach based on a PSS solution is capable of re- 
producing this effect. 

Assume that we determined the (forced) PSS solution xp 5 ^(^) for (1). To 
incorporate noise we add a perturbation term n(^) and get 



^q(i,x) +j(t,x) + n(f) = 0 € (2) 

A natural approach would be to assume that a small n(t) also introduces a 
small deviation to the large signal solution xpss{t) and to linearise (2) by 
choosing x(t) = xpss{t) + Xn{t) and we find the following system for x„(t): 



dt 



(C(t)xn) + G(t)x„ + n(t) = 0 G R’ 



N 



C{t) = 



dx 



G{t) 



dx 



( 3 ) 

( 4 ) 



It turns out that for forced systems this is a good approach because the pe- 
riod T of the solution is completely determined by the input sources, and 
the homogenous problem only has the trivial solution. If we consider a typ- 
ical Fourier component of the noise source, n(t) = Ue-f''*, one may consider 
yn{t) = e~^’'^Xn{t) that satisfies a T-periodic system of equations 



^(C(t)y„) + [G(t) + ji^C{t)]y„ + U = 0 G R^ 



( 5 ) 



(which is parametrized by u). It is clear that, for a single input frequency 
z/, the solution Xn{t) contains frequencies of the form {u Uk) (in which 
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ujk — 27tA:/T), i.e. frequency folding occurs. If we allow for several input 
frequencies vi , we can also say that a certain output frequency might originate 
from a large number of possible input frequencies. Hence, noise components 
at a certain frequency might end up in a different frequency band. This is 
why, for example, 1// noise which has its main energy at low frequencies, 
still plays an important role in RF circuits. 

It is important to note that we described a linear perturbation analysis and 
we will not find contributions containing for example (z/i + 1^2 + (^1 + 

2 1/2 + oJk) etc. This assumption is in general not a severe limitation when 
simulating noise in RF circuits. 

When dealing with perturbed oscillatory systems 

|q(x)+j(x)+n(i) = 0€RN (6) 

it is no longer possible to assume that small perturbations n{t) lead to small 
deviations in xpss{t) [An instructive example is provided by considering 
y'{t) + cos{t)y{t) — 1 = 0, of which the inhomogeneous solution is not peri- 
odic at all; however, note that y{t+27r) still satisfies the differential equation]. 
The main reason is that the period of the large signal solution is infiuenced 
by n(^). This can lead to large (momentary) frequency deviations such that 
the difference between the noiseless and noisy solution can no longer be con- 
sidered to be small. 

In [5] a solution is given to deal with this problem by introducing an extra 
term which describes the frequency (or phase) shift of the solution due to 
n{t). Hence, rather than assuming x(f) = ^pss(l) + ^n{t) as a solution for 
(6), we now assume x(^) = yipss{t-\- +x„(^), where a{t) is a non-trivial 
scalar function that has to be determined as part of the solution process and 
leads to the phase noise of the system, x^ represents the orbital deviation. 
In order to arrive at an expression for the phase- or time-shift function a{t) 
(assumed to be sufficiently smooth), we define s = t + a{t) and y{t) = 
xpss(^) == xpss(^ + a{t)). We observe that y{t) satisfies 

^q(y) + j(y) = C{t + a{t))ui{t + a{t))a'{t), (7) 

in which ui(^) — Xpgg(^), being the tangent to the orbit. Clearly y(t) itself 
satisfies a perturbed differential equation. We note that ui(t) satisfies the 
homogeneous part of (6), linearised around the noiseless PSS solution 

^(C(t)x) + G(t)x = 0eR^, (8) 

with C(t) and G(t) as defined in (4). It follows from Floquet theory [5,11] 
that (8) has N independent solutions (in which ui coincides with our previ- 
ously introduced one) ui (^)e^^^ . . . , u^(^)e^""^ (f), . . . , Ujv(^)- In case 

of a stable index 1 problem we can assume //i = 0 and Re(jUi) < 0 for i > 2. 
The adjoint system of (8) 

C^(t)^(y)-G^(t)y = 0. 



( 9 ) 




RF Circuit Simulation and Electromagnetic Modelling 



35 



has similar properties: , Vjv(^) are N 

independent solutions. It should be noted that the vectors ^i(t) and Uj{t) 
satisfy a special bi-orthogonality relation with respect to C(^) and G(t), 
namely 

V(f)c(<)u(t) = o) , y{t)G{t)u{t) = ( j| 4 ) • (10) 

In applications, the noise (perturbation) term n(^) in (6) has the form n(^) = 
It seems natural to decompose B{xpss{t + <^(^)))b(t) into com- 
ponents along a basis of which one basic function is C{t + a{t))ui{t -f- a(t)) 
(see (7)). By multiplying (7) and B{x{t -h a{t)))h{t) by vf(t), the crucial 
bi-orthogonality implies a non-linear, scalar, differential equation for a{t) 

a'{t) = -v[ {t + a{t))B{xpss{t + a{t)))h{t), q(0) =0 (11) 

from which a{t) can be determined [The same bi-orthogonality also provides 
an elegant way to determine vi (t), once ui (t) is known]. Note that if h{t) = 0, 
for t > to y then a becomes a constant phase shift, and the phase shifted 
function y{t) solves (6) exactly for t > to- In general, even for small b, the 
phase shift function a{t) may increase with time. Because (11) is non-linear, 
phase shifts from individual sources do not add up to give a group phase 
shift. 

In the above we assumed deterministic disturbances prescribed by the time 
function h{t). In noise analysis, however, the noise is usually not described 
by time functions but by statistical properties such as mean and standard 
deviation. In [5,6] the (stationary) autocorrelation of y(^) is studied more 
closely (here the * denotes complex conjugation). One derives (assuming real 
a{t) and Xj being the j-th Fourier coefficient of xpss(^)) 

oo 

Ry{T)=l^E[y{t)y*it + T)]= ^ Rj{r) 

j=-oo 

with a corresponding relation between the spectral densities 

poo 

Sxpss(w) = XjXjSjiuj + ujj), where / XjXjSj{2nf + u>j)df = XjX* 

j=-oo 

The interesting point is that the above formulas do not require the explicit 
evaluation of a{t)\ ^Only’ the variance cr^(r) of a{t) is met, which can be 
related to the of the individual source. This allows for deriving approxi- 
mating expressions for Sj{uj), and also gives way to summing efficiently for 
getting group contributions. 

In this section we summarised RF noise algorithms based on a two-step ap- 
proach: a PSS analysis followed by a linear (for forced systems) or non-linear 
(oscillatory systems) perturbation analysis. Apart from accurate simulation 
algorithms, a circuit simulator needs accurate models. On the one hand there 
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are the models of the non-linear components such as bipolar transistors, MOS 
transistors, etc. Although these models are very important they are outside 
the scope of this paper. On the other hand, for accurate RF simulation we 
also require models of physical structures which are somehow generated by 
solving Maxwell’s equations. This will be the topic of the following section. 

4 Electromagnetic modelling for circuit simulation 

4.1 Concept of lumped elements 

In the previous sections we described circuit analysis that is based on Kirch- 
hoff’s voltage and current law or KVL and KCL, respectively. However, 
Maxwell’s equations are more fundamental and the electromagnetic (EM) 
field is the foundation of circuit theory and electronic modelling and simu- 
lation. In practice, a complete electromagnetic model of an electronic circuit 
is expensive to create and analyse and fortunately, electronic circuit theory 
has shown how to approximate many practical circuits by lumped element 
models: the energy-storage elements, (inductors and capacitors), and the dis- 
sipative elements (resistors) are connected to each-other and to sources or ac- 
tive elements within the circuit by conducting paths of negligible impedance. 
So apparently, the distributed effects, inherent to the solution of Maxwell’s 
equations, in many real circuits can be represented by a few properly cho- 
sen lumped coupling elements. Circuit simulation is based on this concept 
of lumped elements, which ignores the electromagnetic interaction that is 
present within and between the physical circuit components and intercon- 
nections. 

A first level of refinement to the lumped approach is to model the real phys- 
ical interconnections (be it on-chip, in an IC package, a hybrid module or 
on a PCB) by means of lumped parasitic elements describing the conductor 
resistance and the capacitive and inductive coupling between the conductors. 
The ideal (lumped) circuit is extended with this parasitic network and can 
then be treated by the same network analysis and simulation tools. This ap- 
proach can be used when the individual elements and the total circuit are 
small compared to the wavelength of the signals (quasi-static approach) [22]. 
For structures comparable in size to the wavelength there are two effects 
which will play a role which can not be taken into account using this ap- 
proach: distributed effects (compared to lumped) and retardation effects from 
one part of the circuit to another. 

Firstly, we consider the distributed effects. In general the lumped represen- 
tation of an element is valid if the region it occupies is small compared to 
the wavelength and when only one type of energy storage, either electric or 
magnetic is important in that region. If the electric energy storage in parts of 
a primarily inductive element, or magnetic energy in a primarily capacitive 
element, becomes important, the approach through classic circuit theory is 
to divide each element into sub-elements that can be treated as one or the 
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other. A good example is the capacitive coupling between the turns of an 
inductor, which in a first approximation can be represented by adding a ca- 
pacitive element across the terminals of the inductor. A further improvement 
is to add capacitive elements between each pair of adjacent turns. 

Secondly, retardation effects, arising from the finite propagation time of elec- 
tromagnetic effects across the circuit, can cause phase delays in the circuit. If 
there is an in-phase component of the induced electric field (due to changing 
magnetic fields) and magnetic field (due to the current), this represents an en- 
ergy transfer, which is in fact the radiated energy. Another phenomenon that 
cannot easily be modelled with lumped parasitic elements is the presence of 
frequency dependent inhomogeneous current distributions in non-ideal con- 
ductors, e.g. due to skin effect. 

Electromagnetic simulation is aimed at overcoming the limitations of the 
lumped element and lumped parasitics approach. Electromagnetic simula- 
tors build an accurate spatial model of the physical structures of the circuit. 
The spatial model is accompanied by the material properties of the structural 
elements: conductivity, permittivity and permeability. In order to incorporate 
EM effect in circuit simulation, ports (or pins) are attached to the physical 
structure, denoting the locations where e.g. lumped models of components 
or modules are to be attached. In many EM simulators, some approximation 
of Maxwell’s equations is solved. Therefore, in the following section we will 
explain some often used approximations. 

4.2 Maxwell’s equations and the KirchhofF approximation 

Maxwell’s equations describe the electromagnetic field and are given by: 



^B 

V X E = (Faraday’s law) 

(J V 


(12) 


V X H = J + (Ampere’s law) 


(13) 


V-B-0, 


(14) 


V • D = p, (Gauss’ law) 


(15) 


B = pH, D = eE, (constitutive relations) 


(16) 


J = crE, (Ohm’s law). 


(17) 



Here E and H are the electric and magnetic field, B and D are the mag- 
netic and electric flux densities, J and p are the current and charge density, 
and //, e and a are the material parameters permeability, permittivity and 
conductivity, respectively. We can derive several approximations: 

1. Assuming DC conditions {d/dt = 0) and taking the divergence of (13) 
we get V • J = 0, and with Gauss’ theorem over a closed surface S 

JJJ V • JdT/ = If J • = 0. 

V S 



( 18 ) 
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Since the only current flowing out of the surface is in the wires, this gives 
Kirchhoff’s current law (KCL), which simply states that the algebraic 
sum of currents flowing out of a circuit junction is zero. 

From Faraday’s law, we can introduce the potential (voltage) V according 
to E = — W and with Stoke’s theorem over a closed loop / we get: 



JJ V X E • c/yl = 

A 



pE^dl = 0. 

i 



(19) 



This gives Kirchhoff’s voltage law (KVL), which states that for any closed 
loop of a circuit, the algebraic sum of the voltages for the individual 
branches of the loop is zero. These two laws provide the basis for classical 
circuit theory. 

2. If we only neglect the displacement current dT>/dt in (13) we get the 
quasi-static approach. We still obtain the KCL by taking the divergence 
of (13). From (14) we can write B = V x A, where A is a magnetic vector 
potential. When we substitute this in (12), we get 

/1A_ 

V X (E + — ) = 0 = -V X vy, (20) 

which means that in this case we can also define a scalar electric potential 
y by: 

E = -W-f, (21) 

where V fulfils the KVL. Taking the rotation of (12) and substituting 
(13) and taking the rotation of (13), we get the equations for E and H: 



AT. 9E dJ 
AH = —V X J, where 



A = 



( 22 ) 

(23) 



This gives the typical eddy current solutions with skin depth 6 = 1/ yJ'Kfifa 
inside conductors with conductivity a. Currents will run on the edge of 
the conductor within a depth 6 and fields cannot penetrate any deeper 
in the conductor than this skin depth. This approach is required for fre- 
quencies u > 1/ flar‘d where r is the thickness of the interconnect. For 
frequencies high enough so that the current distribution is not uniform 
anymore (interconnect is thick compared to the skin depth) , the resistance 
and internal reactance will become frequency dependent, since they will 
be determined by the skin depth. This approximation can also be seen 
as infinite wavelength approximation of the wave equation solution and 
is applicable when the wavelength is much larger than the dimension d 
of the problem: A > d or frequencies uj « c/d, the so-called quasi-static 
approach. For higher frequencies, radiation losses become important and 
this is not taken into account by the quasi-static approach. 
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3. Clearly it is also possible to solve Maxwell’s equations completely without 
any approximations which leads to the Helmholtz wave equations: 



AE- 



1 d^E 



/x^ + V^, AU + 

at e 



1 d^B. 



-V X J 



(24) 



The solution of this set of equations is called the full- wave solution. 



According to the analysis above, electromagnetic simulation tools may be 
characterised as either quasi-static or full-wave. Quasi-static simulators only 
consider free charges and currents on electrical conductors. By contrast, full- 
wave simulators account for the propagation of electromagnetic waves in free 
space and dielectric materials. Radiation losses are not naturally incorpo- 
rated in the quasi-static model and radiated electromagnetic fields can only 
be approximated by post-processing on the calculated charges and currents. 
These phenomena are naturally incorporated in full- wave tools. A common 
problem in full-wave electromagnetic simulation tools is their computational 
complexity: because of the very fine spatial discretisation, the size of the 
mathematical equations that have to be solved is enormous. Reduced order 
modelling (ROM), the construction of a simplified system to approximate the 
original system with reasonable accuracy, will be necessary to cope with this 
[23-25]. Also for large quasi-static problems, ROM will be necessary. 



4.3 Electromagnetic simulation tools for RF circuits 

To assess the need for on-chip EM simulation let us assume that RF sig- 
nals of interest for present-day businesses are in the frequency range of 1-10 
GHz. Digital signals may have pulse edges in the order of 100 ps. For a 
straight TEM transmission-line consisting of ideal conductors embedded in a 
homogeneous dielectric medium the propagation speed of the signal is 
where 2 — 4 for state of the art RFIC processes. As a result, the wave- 
length of the 10 GHz RF signal is of the order of 15-20 mm. Likewise, the 
distance a digital pulse travels in 100 ps is some 15-20 mm. The die size is 
roughly 10 mm^. The maximum geometrical length of digital signal lines on 
the chip is comparable to the figures shown above. On the other hand, on- 
chip RF interconnections are likely to be only a fraction of the die size and 
the largest components, spiral inductors, typically measure only 0.2-0. 5 mm 
across. Consequently, from this perspective it is unlikely that it is necessary 
to apply full- wave electromagnetic simulation to on-chip components and in- 
terconnect structures. Long RF interconnections will be designed as coplanar 
wave-guides or similar structures and as long as the TEM propagation mode 
is dominant and the losses are small, the quasi-static approximation probably 
remains valid, even for very high frequencies. 

Nevertheless, in order to tackle interconnect related problems Maxwell’s equa- 
tions (quasi-static and possibly full-wave) must be solved which can be done 
in the frequency domain or in the time domain. Within the frequency do- 
main methods one assumes a harmonic time dependency, such that all time 
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derivatives can be replaced by juj. Some of the most popular frequency do- 
main methods are the Finite Difference (FDM) and Finite Elements (FEM) 
Method, as well as the Boundary Elements Method (BEM), Method of Mo- 
ments (MoM) and Spectral Domain Analysis (SDA) [26,27]. The first two 
methods are differential equation schemes, the latter ones integral equation 
schemes. Well-known time domain methods are the Finite Difference Time 
Domain algorithm (FDTD) and the Time Domain Transmission Line Method 
(TDTLM). 

The FDM approximates the differential operators by finite differences. It is 
easy to implement and applicable to general configurations, however has dif- 
ficulty to handle curved boundaries. It may also need a large mesh volume to 
implement the absorbing boundary conditions (ABC) for unbounded prob- 
lems. 

In the FEM, the solution domain is discretized into elements [28]. Making 
use of interpolation functions (shape functions) , each element is mapped into 
a basic standard element. The unknown fields are locally expressed in terms 
of the interpolation functions over each individual element. By applying a 
variational or Galerkin procedure, a set of algebraic equations, described by 
sparse matrices, is obtained. The Galerkin method is in fact a weighted resid- 
ual procedure with trial functions equal to the weighting functions and is one 
of the most widely used methods. This method is also the most general. It 
can handle curved boundaries and arbitrary inhomogeneous material distri- 
butions. As with the FDM, one has to take care with ABC for unbounded 
problems. 

When the FEM is applied to the boundary integral equations, this results 
in the BEM [29]. However, instead of large sparse matrices, where itera- 
tive solvers can be applied, dense matrices are obtained, which in general 
are smaller. This method can handle arbitrary curved boundaries and small 
mesh volumes, due to the integral equation approach. However, inhomoge- 
neous material distributions are more difficult to handle. 

The MoM also employs the method of weighted residuals [30]. The method 
starts by establishing a set of trial functions with one or more variables. The 
residuals are a measure of the difference between trial and true solution. The 
variable parameters are determined in a manner that guarantees a best fit of 
the trial functions based on a minimisation of residuals. 

In the SDA method, the integral equation is derived from the time-harmonic 
Maxwell’s equations or the Helmholtz wave equations using Green’s functions 
[31]. In this method only the surface of conductors needs to be discretized, 
resulting in a small dense matrix equation and thus a fast scheme. The ma- 
jor drawback is that it cannot easily be generalised, since Green’s functions 
may not be available for a general configuration and inhomogeneous material 
distribution. 

For integral equation methods, the so-called Fast Multi-pole Methods (FMM) 
can speed up the calculations and make large and complex geometries com- 
putable. The main idea behind these methods is the fact that the effect of 
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all points working on all points yields a method with O(N^) computations, 
whereas if the points are well separated, one can cluster the ’’far-away” points. 
This yields a method of at most 0(N\og{N)) [32]. 

The Finite-Difference Time-Domain (FDTD) method is currently one of the 
most popular approaches. As first proposed by Yee in 1966 [33], it uses the 
differential form of Maxwell’s equations. Yee used an electric field (E) grid 
which was offset both spatially and temporally from a magnetic field (H) grid 
to obtain update equations that yield the present fields throughout the com- 
putational domain in terms of the past fields. The initial lack of attention, 
in spite of the simplicity and elegance of Yee’s approach, can be attributed 
to the high computational cost. However, recently many shortcomings of the 
original FDTD method were alleviated leading to a near exponential growth 
in publications in the past ten years [34]. 

Finally, the TDTLM method models the spatial electromagnetic field in terms 
of a distributed transmission line network after discretizing the solution do- 
main [35]. Basically this method has the same limitations as the FDTM 
method. 

Unfortunately, however, most of the currently available 2.5D electromagnetic 
simulators (such as those based on MoM or FDTD) are only suited for rela- 
tively small interconnect structures due to large computing times. 



4.4 Coupling EM simulators to circuit simulators 

The purpose of our explanation on EM simulation was to include the in- 
fluence of physical structures such as interconnect in RF circuit simulation. 
This means that somehow we have to connect the EM simulator, or the re- 
sults thereof, to the circuit simulator. In this section we give some possible 
approaches. 

A possible solution would be to tightly couple a circuit simulator with an 
EM simulator (co-simulation). The idea is based on an iterative approach 
in the time domain and requires costly computations and hence in general 
it is not an attractive approach. However, when antennas are integrated on 
silicon [36], electromagnetic and circuit analysis must be combined. By its 
very nature (conversion of electromagnetic radiation into electrical energy 
and visa versa) an antenna cannot be treated as a parasitic component and 
co-simulation will be necessary. 

Another approach is to use an EM simulator as a stand-alone tool to generate 
a compact model which can be incorporated in the circuit simulator in addi- 
tion to the lumped circuit components connected to the physical structure. 
This requires a robust method to catch the behaviour of the EM model at 
the ports. Most of the present EM simulators can produce Y- or S-parameter 
models that can be incorporated in a circuit simulator and they are most 
easily used in (linear or non-linear) frequency domain analyses such as AC 
and Harmonic Balance. For non-linear time-domain methods, such as PSS, we 
need to reflect the behaviour as a system of DAEs. There are a number of pos- 
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sible ways to couple this to the simulator. One can generate these equations 
directly from the S-parameters or via a behavioural modelling step. Some 
simulators calculate the impulse response of an S-parameter component by 
the inverse Laplace transform and apply numerical convolution during tran- 
sient simulation. 

Other tools are capable of creating an equivalent circuit model i.e. an RLC 
network. This type of model is more versatile but more difficult to generate. 
The size of the model could be a problem as well as its passivity (non-stable). 
Instead of mapping onto a real network, the model could also be described 
in terms of the G- and C matrices which can be incorporated in the circuit 
simulator. 

Interesting work was done in coupling an FDTD simulator to the circuit sim- 
ulator where a behavioural model approach was used to generate a lumped 
element model [38]. 

A limitation of using an EM simulator as stand-alone tool is that the actual 
EM model is never exercised in its circuit context, which means that the cur- 
rent distribution in the physical structures is not explicitly known. However, 
some EM simulators can reconstruct the geometrical current distribution and 
EM radiation from the currents at the model ports and create a visual rendi- 
tion of them. To use this feature it is necessary to make the circuit simulation 
results available to the EM simulator [37]. 



4.5 Future developments 

Currently the most common applications, where an EM simulator generates 
some kind of model for the circuit simulator, are the simulation of printed 
circuit boards and micro- wave circuits. The physical structures in these ap- 
plications consist of planar metallisation structures on (almost) loss-less sub- 
strates. The most successful methods to analyze these usually large struc- 
tures (e.g. complete multi-layer PCBs) are based on 2.5D boundary element 
analysis. These methods produce either a rational approximation of the S- 
parameters or a reduced lumped element model by using a fit procedure 
(Momentum-RF, Fasterix). In case of a lumped circuit model, the resulting 
circuit is in general not passive. Reduced order modelling techniques that 
guarantee stability, such as PVL [25], will need to be used. 

For modelling physical structures on an IC, one of the most promising solu- 
tions seems to be to adjust the tools and methods used for the planar struc- 
tures. MoM and FDTD seem to be less favourable candidates for ICs, due to 
high computational cost. In order to be able to expand the above-mentioned 
tools for simulation of ICs, several problems need to be solved: 

— 3D effects: due to the shape of IC interconnect, the effects of the side-walls 
have to be taken into account. 

— Substrate effects: the actual IC interconnect behaviour can only be cal- 
culated when the effects of the silicon substrate are taken into account 
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(displacement and induced (eddy) currents and concomitant losses). The 
silicon substrate cannot be approximated by an ideal conductive plane. 
The resistance of the substrate is high, leading to dispersion and fre- 
quency dependent attenuation in signal lines and to reduced q-factors 
in passive components, such as spiral inductors. A number of models to 
account for substrate effects have already been developed but need to be 
coupled to the interconnect analysis tools. 

— These 3D and substrate effects increase the complexity of the problem. 
Therefore robust methods should be developed to reduce the complexity 
of the final model which is used in the circuit simulator. The stability of 
these models should be a point of attention. 
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Abstract. We review some recent extensions of the Finite Integration Technique 
(FIT), which is known to be a generalization of the Finite Difference Time Domain 
(FDTD) method. Some shortcomings of the standard formulation are discussed 
which limit the applicability or at least the efficiency of the time domain variant of 
FIT. The novel developments which are proposed in this paper cover both the basic 
geometrical modeling in space and time and advanced methods to solve the alge- 
braic problems in time and frequency domain. A numerical application is presented 
to demonstrate the performance of the algorithms in the high frequency regime. 



1 Introduction 

For many high frequency electromagnetic applications — especially whenever 
broadband results are to be calculated — time domain simulation methods 
can be considered as the most suited approach. This is reflected by the great 
popularity of the Finite Difference Time Domain algorithm (FDTD, [1]) and 
the vast number of publications in this area. 

An alternative formulation that leads to a computationally equivalent 
update scheme in time domain is supplied by the Finite Integration Technique 
(FIT [2,3]). Due to its general approach, FIT can also be applied to static 
or time-harmonic problems or be combined with alternative time integration 
schemes, e.g. for low-frequency problems. 

However, there are also several shortcomings of the standard FIT (or 
FDTD) formulation. Two of them are the poor modeling quality for arbitrar- 
ily shaped geometric objects, if Cartesian grids are used (’staircase’-problem), 
and the reduced efficiency in the simulation of resonant structures, where the 
long transients lead to a long simulation time. 

In the following we first give a short introduction into the notation and 
some basic properties of the FIT. In the main part we discuss some recent 
developments facing the problems mentioned above: Techniques to increase 
the modeling accuracy of the FIT for curved boundaries (section 2), and to 
simulate highly resonant structures (section 3). Finally we present an appli- 
cation (section 4) where several of the proposed techniques are applied to 
obtain accurate simulation results with moderate computing resources. 
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1.1 Maxwell Grid Equations 



Similar to the FDTD method, the Finite Integration Technique uses a pair 
of staggered computational grids, the primary grid G and the dual grid G, 
which however can have a more general structure as the standard ’Yee cell’ of 
FDTD. The state variables of FIT are so-called grid voltages and grid fluxes 
which are deflned as integrals^f the electric and magnetic flelds on edges Lf, 
Li, or faces Ai of G and G, respectively: 



^ i 
di 



L 

L 



E • ds, 



D dA, 



bi = f B ' dA, (la) 

JAi 

hi = [n^ds, = [j^ dA. (lb) 

JLi JAi 



Using this kind of state variables, an exact representation of the integral 
form of Faraday’s and Ampere’s law applied to facets of the grids can be 
found. In matrix- vector form, the so-called Maxwell Grid Equations [3] read 

^' = -5®- = + (2) 

with the curl-matrices C and C = C^, and the vectors of voltages and fluxes 
e, h and d, b, j, respectively. The approximations of the method Anally 
take place in the material matrices (the discrete constitutive relations), given 
here for the linear case: 

dr^M^e, b = M^h, J = M«e -f- J 5 (3) 

( j 5 = source currents). For dual orthogonal grid systems, where edges of 
G and facets of G (and vice versa) intersect by 90°, M^, M^, and are 
diagonal matrices which can be easily inverted. 

Equations (2) can be symmetrized and rewritten in one large system: 

(if ^ ~ 0 

V ^ 

A 



J X -h b (4) 



with the state vector x and the excitation b: 



M. 



V2, 

1 / 2 | 






b = 



-M, 



-1/2 



3s 



(5) 



1.2 Formulations for the High-Frequency Regime 

Frequency Domain. In frequency domain we can eliminate the magnetic volt- 
ages h in ( 2 ) and obtain the discrete curl- curl- equation which reads (for the 
lossless case with = 0 ): 

( M7^C^M;^C -(u^i) e = (6) 



Acc 
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It can be easily shown that the curl- curl system matrix Acc has only real 
eigenfrequencies if the material operators and are symmetric 
and positive definite. This important property which is referred to as spa- 
tial stability [3] is responsible for the long-time stability of the FIT system 
in time domain. In frequency domain these eigensolutions correspond to un- 
damped oscillating resonant modes in lossless structures. It is a standard task 
of many electromagnetic simulation codes to calculate the dominant eigen- 
solutions of Acc, he. the resonant modes with lowest frequencies ui > 0. To 
solve the inhomogeneous curl-curl equation (6) for current-driven devices at 
a fixed frequency uq is often a quite expensive task, and the development of 
robust algebraic solvers including appropriate preconditioners is still a topic 
of ongoing research [4]. 

Time Domain. In time-domain, we can approximate the time derivatives 
in (2) by central differences on a staggered time grid t^'^^ = to m - At and 
obtain the update formulas of the leapfrog scheme: 

g(m+i)^g(m) (7a) 

g(m+3/2) ^ g(m+l/2) ^ - J ■ (7b) 

The size of the time-step width At is limited by the well-known Courant- 
criterion due to stability reasons. Combined with the spatial operator matri- 
ces for a Cartesian grid (the ’Yee-celF), these formulas and the FDTD-scheme 
can be shown to be computationally equivalent [5]. 

Multiport Devices. Many electrodynamic devices can be modeled as n-port 
systems, where only the behaviour at distinct input and output ports is of 
interest. So-called generalized ports include the standard transmission line 
ports of microwave devices, but also some more sophisticated input /output 
concepts like filament currents or the interaction of electromagnetic fields 
with charged particle beams in accelerating cavities [6]. 

In the most common case, the ports are waveguide-type apertures with 
one propagating mode per port^ . If the signals considered are the amplitudes 
of incoming (a*) and outgoing {hi) waves, we can define the scattering matrix 
S of the structure by b = Sa. 

Alternatively the impedance matrix Z is given by the relation u = Zi of 
the generalized voltage and current quantities 

Ui = {ai + bi) U-Z- ^ {oi - hi), (8) 

with the line impedance Zi of each port. The waveguide modes are normalized 
such that Ui • ii — —b‘1 describes the power fiow through a port. 

^ Apertures where more than one waveguide mode shall be considered are concep- 
tually identified as separate ports. 
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There are two possibilities to simulate such multiport devices with FIT: 
Either the ports are treated with a special absorbing boundary condition 
in time or frequency domain [7], where the incoming wave amplitudes ai 
are directly realized as excitation of the structure. This approach yields the 
S-matrix of the structure and is the most efficient approach in time domain. 

In frequency domain it is often advantageous to treat the ports with a 
closed (perfectly magnetic conducting) boundary condition and to excite the 
structure with equivalent input currents ii in the port planes, respectively. 
This technique leads (in the lossless case) to real-valued system matrices 
which are much easier to solve. It directly yields the impedance matrix Z 
which can easily transformed into the scattering matrix S if necessary. 

Following here the second approach, we can define coupling matrices 
which map the n input currents i{ onto the discrete source current vector 
j 5, and the electric field vector e on the output voltages uf. 

b - Bi u = Fx. (9) 

B and F=B^ contain the discrete field patterns of the waveguide modes, 
which can be obtained by the solution of a related 2D eigenvalue problem for 
each port. With (4) we obtain the state-space formulation 

-^x = Ax + Bi , u = Fx. (10) 

at 

In frequency domain the impedance matrix finally is given by 

Z = F {jool- A)-^ B. (11) 



2 Advanced Geometric Modeling Techniques 

The restriction of the original FDTD algorithm to 2D and 3D equidistant 
Cartesian grids is often considered to be the major disadvantage of this 
method. While guaranteeing the simplicity as well as the computational effi- 
ciency of FDTD, Cartesian grids suffer from their poor modeling quality for 
applications with complex geometries (’staircase ’-approximations). 

In the following, we will concentrate on two approaches to improve the 
geometric modeling capabilities of FIT : The application of triangular grids in 
2D and an enhanced technique to derive the related material matrices, and 
recent results for so-called sub-cell methods, where the geometrical informa- 
tion below the level of the cell-resolution is taken into account. 



2.1 Extended Algorithm for Triangular Grids in 2D 

If conformal triangular grids are used in two-dimensional problems, the gen- 
eration of these grids seems to be a well-understood task with a large variety 
of efficient algorithms available. For the application of the FIT also a dual 
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grid is needed, and in the classic implementation [8] this dual grid has to 
fulfill the orthogonality property mentioned above. Such a dual orthogonal 
grid, however, can only be found, if all primary grid angles are below 90°, 
and this demand on the grid generation process can not always be met [8]. 

Recently, a more flexible scheme with an extended material operator has 
been proposed [9]. For the new material operator we introduce special in- 
terpolating functions, so-called Whitney forms ^ which are well-known from 
Finite Element (FE)-approaches [10]. They define a map from the degrees 
of freedom of the discrete formulation (here: the electric grid voltages e^) to 
field values at arbitrary points inside the triangular cells: 

E{r) = J2eM^\r). ( 12 ) 

i 

Note, that this approach is consistent with the integral-type definition of the 
electric voltages in (1) due to the relations 

Knowing jthe electric field at any point inside the cells, the electric flux 
components di can be ’exactly’ (referring to the interpolation function) cal- 
culated by an integration of (12) over the corresponding dual facets: 

di = [ eE{r) • dA = ej [ ew^^^\r) • dA. (14) 

d Ai j J 

Due to the local support of the basis functions, this approach also yields a 
sparse (but non-diagonal) material matrix M^. Its off-diagonal entries 




{i ^ j) are generally non-zero also for orthogonal grid systems. 

A thorough analysis of this new matrix reveals that it is non-symmetric in 
the general case, which deteriorates the spatial stability of the discretization 
scheme. However, it can be symmetrized by varying the position of the dual 
point in each cell and thus the shape of the integration areas for the electric 
grid fluxes in (14). It can be shown that there is one unique position for the 
dual point P, which results in a symmetric matrix for both the electric and 
the magnetic material operator [9]. 

Numerical evaluations in [9] show that these new matrices lead as expected 
to a 2nd order accurate scheme, where the grid angles of the primary (trian- 
gular) can now be extended to a maximum of about 120°. As an additional 
result, the new material operator is found to be different from an FE-type 
matrix based on Whitney-forms as well as from the classic FIT-matrix (even 
in the case of dual-orthogonal grids). Thus, this derivation supplies a com- 
parison — and some important hints on the differences between FIT and FE 
discretization schemes on triangular grids. 
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2.2 Extended Conformal Algorithm for Cartesian Grids 

The most efRcient choice to tackle the staircase problems of FIT in three 
dimensions is probably to use so-called sub-cell methods, which are able to 
model arbitrary partial material fillings within Cartesian grids cells. 

In the partially filled cells (PFC) methods proposed in [11] and [12] this 
is realized for interfaces to perfect electric conducting (PEC) materials by 
considering reduced cell length and areas in the derivation of the material 
matrices. As shown in Fig. 1, left, this technique may lead to very small 
dimensions of the ’remaining’ cell. This is no problem for the accuracy of the 
PFC method, but it may induce a considerable overhead in the time domain 
algorithm, as such small cell lengths require to reduce the maximum stable 
time step according to the Courant criterion. 




Fig. 1. PFC-approach (left) with reduced cell edges, and new USC-scheme applying 
a larger virtual cell to avoid the reduction of the maximum stable time step (right). 



As a remedy, an extended version of these schemes, named Uniformly 
Stable Conformal (USC) scheme has been proposed in [13]. The idea of the 
new method is to use also the information from adjacent cells to complete 
the local curl-operation. Thus, the effective cell size is enlarged instead of 
reduced, and the same time step can be used as in the standard scheme. 




Fig. 2. Performance of the USC scheme: Relative field error 5 of eigensolutions in a 
cylindrical cavity with diameter d (left), and maximum stable time step (normalized 
to the Courant limit, right) for a varying grid step size A. 
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The additional computational cost of this approach is usually negligi- 
ble, as only for a small number of field components (at the PEC interfaces) 
off-diagonal elements in the material matrices have to be processed. Some 
numerical results for the USC scheme compared to the standard PFC ap- 
proach are shown in Fig. 2. The accuracy of the USC method is typically as 
good as with PFC, whereas from the larger time step a speed-up factor in 
time domain of about two can be achieved. 

3 Highly Resonant Structures 

In standard time domain methods, the long settling times in highly resonant 
devices such as cavities and microwave filters require a large number of time 
steps to reach steady state. If the goal of the simulation is a precise pre- 
diction of the system’s response near a resonance, the use of approximative 
techniques, such as introducing artificial losses or non-conservative time in- 
tegration schemes, is out of the question. On the other hand, a conventional 
eigenvalue analysis (in frequency domain) of the ’closed’ structure, with short 
circuits at the input/output ports, only gives a hint on the resonance frequen- 
cies, but no quantitative results for the behavior of the original (open) device 
around that frequency. 

3.1 Model Order Reduction 

An alternative approach to analyze highly resonant devices arises from the 
idea that the number of degrees of freedom in the original system (2) (the 
number of e and h-components in the grid) is typically much higher than 
the number of poles and zeros which determine its system response within a 
certain frequency range. Thus, the goal of Model Order Reduction (MOR)- 
approaches is to approximate the system by a smaller one before actually 
solving it. 

Lanczos-based Techniques. The Lanczos algorithm generates two Krylov- 
spaces, spanned by vectors and (i = 1 . . .p) with 

W^V = I and W^AV = T, (16) 

where T is a banded matrix. Using the (approximative) projection x = Vz 
leads to 

z' = (FV) {joji - T)-1 (W^B) 

= F' (jujl - T)-^ B'. (17) 

The dimension of this reduced system is only p x and if the projection 
takes into account all dominant eigenvectors of the original system, Z' will 
be a good approximation of the impedance matrix Z. 
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If the Lanczos algorithm is applied to the shifted and inverse matrix 
(A— 5oI)“^ we obtain a PVL-approximation which needs a considerably lower 
number of basis vectors. However, inverting the matrix is computationally 
very expensive and can only be performed for small systems. More details 
and an algorithm to combine both ideas can be found in [15]. 



Modal Approach. A very similar technique is obtained, if instead of the Lanc- 
zos basis vectors Vi several eigenvectors of the system are used in the 
projection above. The accuracy of this modal approach [6] depends on the 
eigenmodes which are not considered in the expansion, and their coupling to 
the input quantities b^. For eigenfrequencies well above the frequency range 
of interest, oji^ uj {i > p), this contributions can be approximated to depend 
only weakly on u. Thus, a constant correction vector can be added which is 
based on the solution of the inhomogeneous equation (6) for one frequency 
ujQ [16]. This yields for one entry in the impedance matrix: 



p 

i=l 



(b^Xj) (xfb„) 

ju) - juJi 






(18) 



For the required solution of the inhomogeneous curl-curl equation a good start 
solution for iterative solvers is supplied by the (truncated) modal expression. 

An important advantage of this modal approach in combination with 
the correction term is that only a moderate number of modes is needed for 
accurate results. In contrast to the Lanczos scheme, they can typically be 
completely held in memory and be used for the construction of field solutions 
at arbitrary frequencies. 



4 Numerical Application 

As a numerical example we present the calculation of effective material coef- 
ficients in so-called metamaterials [18-20]. Metamaterials consist of a lattice 
of conducting, non-magnetic elements that can be described by an effective 
magnetic permeability and an effective electrical permittivity Both of 
these parameters can exhibit values not found in naturally occurring materi- 
als — e.g. simultaneously pi,ff < 0 and < 0 — with some very interesting 
consequences for wave propagation [18]. Typically each cell of the lattice rep- 
resents a resonant structure with spatial dimensions much smaller than the 
incident wavelength. 

Fig. 3 (taken from [20]) shows a metamaterial consisting of an array of 
split ring resonators (SRRs) and accompanying wires, which has been subject 
to measurements as well as simulations before ([20], and in a simplified form in 
[19]). The task for the numerical methods is to perform field simulation in this 
structure and to extract the material parameters from the electromagnetic 
fields over a certain frequency range. 
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Fig. 3. SRR-type metamaterial and one cell in the simulation model. 



The effective material parameters in metamaterials describe an equiv- 
alent homogeneous medium which has the same propagation properties as 
(macroscopic) planar waves in the lattice. According to [18], they can be cal- 
culated from averaged field quantities in one cell of the lattice, e.g. for the 
permeability: 

-ffx.arM = A / H{u),r)-dr, = T A B{ixJ,r)-dA,{19) 

Jl^ Js^ 

where Lx is the length of the cell, Sx is its normal facet, and 

It can be shown [16] that for waves in coordinate direction only one cell 
of the lattice has to be simulated. This cell can then be considered as a 2- 
port structure, and for the extraction of the effective material parameters 
the complete 2x2 scattering matrix S or the impedance matrix Z has to 
determined. 

Our first discrete model consists of a rather coarse mesh with only 22 x 
10 X 9 == 1, 980 grid points. Due to the small dimensions of the SRR, this is 
still equivalent to more than 70 lines per wavelength. However, to be able to 
model the geometric details of the SRR in this mesh, a sub-cell method as 
described above has to be applied. In a second simulation, a finer mesh with 
38 X 18 X 16 == 10,944 grid points (130 lines per wavelength) is used. 

As a result. Fig. 4 shows the scattering parameter (reflection 5n) of one 
cell, as well as the real part of the effective permeability for the frequency 
range from 4GHz to 5.5GHz. Obviously the coarse discretization gives already 
qualitatively good results, but exhibits a frequency shift of about 200 MHz 
compared to the simulation with the finer mesh. The extracted permeability 
quantity is as expected negative near resonance. 

For the simulation of this resonant structure different approaches have 
been applied (with indistinguishable result curves) : A standard time domain 
simulation (TD), the Lanczos-based model-order-reduction technique (MOR) 
according to [15], and the modal approach with correction term (M+C). The 
computational cost is summarized in Table 1. Note that one time step in 
TD is equivalent to one matrix- vector operation in the other approaches, and 



Advances in Finite Integration Technique 



55 





4 4.5 5 5.5 



// GHz 



Fig. 4. Scattering parameter (reflection coefficient 5n, left) of one cell, and real 
part of the effective permeability in the SRR medium (right). The results of the 
fine and the coarse grid resolution agree qualitatively, but exhibit a slight frequency 
shift by about 200MHz. 



that in TD, FD, and for the correction term in the modal approach two runs 
are needed to obtain the full S or Z-matrix. For comparison, also a direct 
solution of the curl-curl equation (6) for a single frequency ujq (frequency 
domain, FD) is included. 



Table 1. Computational cost (number of matrix- vector operations) for different 
methods: Time domain (TD), frequency domain (FD, single frequency point), model 
order reduction (MOR), modal approach with correction (M-hC). 



grid 

points 


TD 


FD 

(single freq.) 


MOR 
(no fields) 


M-fC 

10 Modes Correction 


1,980 


2 X 50, 000 


2 X 300 


260 


- 1,100 


2 X 70 


10,944 


2 X 85, 000 


2 X 600 


520 


- 2, 300 


2 X 180 



Due to the small geometric details compared to the wave length (lattice 
parameter Lx = 1.46 cm compared to A 5 cm @ 6 GHz, grid step sizes 
A < 0.07 cm), the model is spatially ’oversampled’. In time domain with 
the Courant-limit for a stable time integration this leads to a large number 
of time steps per period (l/(/ * At) = 700, or 1000 for the coarse and the 
fine grid, respectively). Thus, together with the long settling times, the time 
domain approach is not competitive in this application. 

As long as only the S-parameters are needed, the most efficient approach 
is the MOR method. However, since the complete Krylov-space generated in 
MOR can usually not be held in memory, no field solutions are available for 
the extraction of the effective permeability. 

The computational cost of the modal approach depends mainly on the 
eigenmode-solver applied (and often also on its internal settings). Although 
in this case 2 or 3 eigenmodes in the expansion are sufficient for accurate 
results, we recommend to compute a set of at least 7 to 10 modes: The 
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computation time in this range is only slightly higher than for 3 or 5 modes, 
and the equation solvers in the correction part benefit from the improved start 
solution. Furthermore, the number of modes actually used can be varied to 
supply for a simple convergence check in the post-processing. 

5 Conclusion 

Together with the recently developed sub-cell extensions for the FIT (or 
FDTD) discretization scheme, this technique can be considered to represent 
the state-of-the-art method for a large area of microwave applications. If the 
flexibility of a Cartesian computational grid is not sufficient, the FIT ap- 
proach allows to switch to generalized grid types without changing the alge- 
braic notation or loosing its important conservation and stability properties. 
Whereas in 3D Cartesian grids seem to be superior to most other grid types 
in terms of their easy and robust data structure (grid generation, data format 
and access), triangular grids can be a competitive alternative for many 2D 
applications. 

The efficiency of FIT’s time domain approach, is not satisfying in some 
cases, especially if highly resonant devices are to be handled. Such an applica- 
tion is the simulation of so-called metamaterials, where the electromagnetic 
fields can be described by negative effective material parameters in a small 
frequency band near the resonance. 

The proposed model order reduction techniques, based on a projection 
of the electromagnetic fields on a Krylov subspace or the subspace defined 
by some eigensolutions of the model, turn out to be very efficient and robust 
tools for such applications. Depending on the required output data — transfer 
matrices or also field solutions — considerable speed-ups in computation time 
can be achieved. 
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Abstract. The paper explores a class of “Finite Element Difference” (FED) schemes 
with Finite Difference-type data structures but based on Finite Element - varia- 
tional principles. Curved material boundaries are approximated algebraically on 
relatively coarse regular rectangular or hexahedral grids by a judicious choice of lo- 
cal approximating functions, rather than geometrically on conforming meshes. The 
grids do not have to resolve small geometric details. The proposed approach com- 
bines the ideas of the Generalized Finite Element - Partition of Unity methods. 
Discontinuous Galerkin Methods and Finite Difference / Finite Volume / Finite 
Integration Techniques. 



1 Introduction: Finite Differences and Finite Elements 
for Complex Geometries 

The major complicating factors in numerical simulation of fields are the pres- 
ence of curved interface boundaries between materials with different param- 
eters and point / edge / corner singularities. The standard Finite Difference 
(FD) and Finite Element (FE) Methods do not handle singularities well. 
Curved boundaries can be accurately rendered in conventional FE analysis by 
geometrically conforming meshes, whereas in FD a ‘staircase’ approximation, 
with related numerical artifacts and loss of accuracy, is typical. The success 
and popularity of FEM in the engineering community can be attributed, to 
a large extent, to its ability to handle complex geometries - although on a 
more fundamental level it can be argued that the advantages of FEM stem 
from its underlying basic variational principles. 

However, the geometric and physical flexibility of FEM comes with a price. 
Mesh generation can be complicated, especially in 3D, and the associated 
data structures are complex as well. Consequently, FEM is not competitive 
with FD methods for simple geometries. Our aspiration is to explore a class 
of “Finite Element Difference” (FED) methods loosely defined as FD-style 
schemes with FE-type capabilities, with as many of the following features as 
possible: 

— Finite difference data structures and schemes based on Finite Element 
variational principles. 
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W. H. A. Schilders et al. (eds.), Scientific Computing in Electrical Engineering 
© Springer- Verlag Berlin Heidelberg 2004 




Generalized Finite Element Difference Methods 



59 



— Ability to handle curved interface boundaries on relatively coarse regular 
rectangular or hexahedral grids with high accuracy and without having 
to resolve small geometric details. 

— Flexibility in the choice of local approximating functions. 

— Few or none additional unknowns other than the nodal values of the 
potential. 

— For elliptic problems, symmetric positive definite discrete systems. 

— Feasibility of h- and p-type refinement. 

— Optimal convergence rate. 

The following computational techniques have some, or many, of these desired 
features and are relevant to our analysis: 

1. Generalized FEM by Partition of Unity [27], [3], [20], [39]. 

2. Heuristic homogenization schemes by volume averaging [18] or edge- 
length averaging [46]. 

3. Homogenization schemes based on variational principles [28]. 

4. “Finite Integration Techniques” (FIT) and its extensions and enhance- 
ments developed by researchers in Darmstadt [24], [13]. 

5. Discontinuous Galerkin (DG) methods [1], [8], [11], [14]. 

6. Methods rooted deeply in differential geometry and algebraic topology 
[7], [22], [25], [40], [41]. 

7. Miscellaneous other techniques, such as special elements with holes or 
inclusions [26], [37], and “immersed surface” methodology [44]. The latter 
modifies the Taylor expansions to account for derivative jumps at material 
boundaries but leads to rather unwieldy expressions. 

Obviously, FD-FV-FIT methods have always been oriented toward regular 
meshes, and the recent advancements in their application to inhomogeneous 
problems are stimulating (see Acknowledgment). Nevertheless variational ap- 
proaches remain valuable, as they tend to automatically preserve important 
physical and mathematical characteristics of the problem. Section 2 reviews 
some of the existing techniques that are most closely related to the mate- 
rial of this paper. A FED-type method of Section 3 is a natural extension 
of the variational-difference scheme proposed by Moskow et ah [28] and can 
be viewed as a nonconforming FEM. Numerical experiments are described in 
Section 4. 

The new and promising framework of “multivalued approximation” out- 
lined in Section 5 offers a simple and general way of constructing FED-type 
schemes. A discussion of the results and future directions appears in 6. 

2 Numerical Methods for Inhomogeneous Media 

2.1 Generalized FEM by Partition of Unity 

A detailed explanation and analysis of this method proposed originally by 
Babuska and Melenk [27], [3], is widely available (e.g. [39]). Of all interest- 
ing features of GFEM, the most salient for our present goals is the ability 
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to employ a variety of special nonpolynomial approximating functions. In 
particular, functions representing jumps of the normal derivatives of the po- 
tential at interface boundaries can be added to the basis. Strouboulis et al. 
[39] present an extensive set of application examples with special functions 
for material inclusions in stress analysis. Babuska et al [2] apply GFEM to 
problems with material interfaces. Plaks et al [31] implemented GFEM for 
problems with magnetized nanoparticles. 

In GFEM the computational domain is considered to be covered with 
overlapping subdomains (‘patches’), and different local approximations are 
merged by Partition of Unity (PU) constructed on this system 

of patches. 

E patches ^ ^ SUpp((/?i) = Hi (1) 

i=l 

That is, each function is associated with the respective patch Qi and 
vanishes outside that patch. Then the global solution u can be decomposed 
into its ‘patch components’ Uii 

u = — with Ui = uifi (2) 

i i i 

Decomposition (2) can be used in a natural way for assembling the global ap- 
proximation of the solution from the local ones. Suppose that locally, within 
each patch the exact solution u can be approximated by a linear combi- 
nation of some approximating functions | gm } • 

L J m=l 



Cm being some (real- or complex- valued) coefficients. The final system of 
approximating functions | xpm | is built with ipi as weight functions: 

(4) 

Multiplication (4) by cpi guarantees seamless merging of patch- wise approx- 
imations, with rigorously provable estimates of the global error in terms of 
local errors and the norms of the PU functions p [3]. On the negative side, 
however, this multiplication complicates the set of approximating functions 
and tends to make it more ill-conditioned (in some cases even linearly depen- 
dent, see [3]). The computation of gradients, implementation of the Dirichlet 
conditions and numerical Galerkin quadratures also gets more complicated. 
In addition, GFEM-PU may lead to a combinatorial increase in the number 
of degrees of freedom. For illustration, consider a regular hexahedral mesh 
where a ‘patch’ is defined as a set of eight hexahedra around a common node. 
In the presence of material boundaries, it is sensible to replace the usual eight 
trilinear basis functions with eight special functions satisfying the derivative 
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jump condition at the interface, as described in the following subsection. 
In GFEM-PU, each of these special functions gets multiplied by the ‘shape 
function’ of the patch. As each hexahedral element of the mesh is an inter- 
section of eight patches (centered at its eight respective nodes) and each of 
these patches contributes eight approximating functions, the stiffness matrix 
for elements close to material interfaces is 64 x 64 instead of the usual 8x8. 
For all of the above reasons, alternative approaches may be worth exploring. 
One such approach is discussed in Section 3 and another promising one [43] 
is outlined in Section 5. 



2.2 Generalized Special FEM for ‘Rough’ Coefficients 



Babuska et al [2] proposed special approximating functions satisfying the 
boundary condition on material interfaces for an elliptic problem where the 
coefficient can vary sharply in at most one coordinate direction: 

V ■ a{x)Vu{x,y) = f{x,y) (5) 



in a rectangular domain with homogeneous Dirichlet boundary conditions. 
The coordinate transformation 



X 




ds 



y{y) = y 



( 6 ) 



“removes” the derivative jumps at material discontinuities. Therefore the 
conventional FEM on a standard (say, triangular) mesh can be considered in 
the transformed domain. Corresponding to that, there is an FE method on 
a curvilinear mesh in the original domain. The approximating functions are 
piecewise- linear with respect to the transformed coordinates x, y but are 
more complex in the original coordinates. 

This coordinate mapping forms a common basis for three different varia- 
tional schemes in [2]. The first approach is to use the special approximating 
functions on curvilinear triangles and the same Galerkin test functions. The 
method is conforming and is equivalent to the standard FEM in the trans- 
formed domain. 

In the second approach, regular (rather than curvilinear) triangles are 
employed in the original domain, and the approximating functions are the 
same as in the previous method. The standard piecewise-linear continuous 
test functions are applied. Since the spaces of approximating and test func- 
tions are in this case different, nonsymmetric systems result and the stability 
constant is higher. The approximation space is not a subspace of if^(i7), 
and hence this version of the method is nonconforming. The third approach 
of [2] fits in the framework of GFEM-PU [3] (historically, though, GFEM 
was formally introduced a few years after [2]). An ordinary triangular mesh 
is used, and the standard nodal basis functions form a PU. The local ap- 
proximating functions, while the same as in the previous two approaches, get 
multiplied by the PU functions in accordance with the GFEM-PU procedure. 
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The method is conforming, standard meshes are used and matrix symmetry 
is preserved - thus this approach is arguably the best of the three. However, 
the multiplication by the PU functions in GFEM does add some complexity 
and redundancy, as discussed above. 

The authors of [2] extend their first and third approaches to a general 
case where the material coefficient can vary sharply locally in at most one 
direction. The extension of the first method also relies on curvilinear meshes, 
and the extension of the third method is again a GFEM-PU algorithm. The 
description, analysis and implementation of these methods in this general 
setting are rather complex and for the first method may be impractical. 

2.3 Homogenization Schemes Based on Variational Principles 

This interesting approach uses the special approximating functions of [2] as 
a starting point for the construction of a difference scheme. Within each 
grid cell, the authors seek a (generally anisotropic) material parameter such 
that the discrete and continuous energy inner products are the same over 
the chosen discrete space. The overall construction in [28] relies on a special 
partitioning of the grid ( “red-black” numbering, or the “Lebedev grid” ) and 
on a specific, central difference, representation of the gradient. A rigorous 
proof of discrete weak convergence for the proposed scheme is given, and the 
numerical experiments show that the method can give accurate results even 
for curved boundaries and high contrasts of material parameters between 
different regions. 

The method proposed in Section 3 can be viewed as a generalization of 
the variational-difference approach of [28]. The special Tebedev’ grids and 
the specific approximation of gradients by central differences adopted in [28] 
turn out not to be really critical for the algorithm. 

2.4 Discontinuous Galerkin Methods 

The idea to relax the interelement continuity requirements of the standard 
FEM and to use nonconforming elements appeared at the early stages of FE 
research ([17], [38]). For example, in the Crouzeix-Raviart elements [17], the 
continuity of piecewise-linear functions is imposed only at midpoints of the 
edges. 

Over recent years, a substantial amount of work has been devoted to 
Discontinuous Galerkin Methods DGM) [1], [14]; a consolidated view with 
extensive bibliography is presented in [1]. Many of the approaches start with 
the “mixed” formulation that includes additional unknown functions for the 
fluxes on element edges (2D) or faces (3D). However, these additional un- 
knowns can be replaced with their numerical approximations, thereby pro- 
ducing a “primal” variational formulation in terms of the scalar potential 
alone. In DGM, the interelement continuity is ensured, at least in the weak 
sense, by retaining the surface integrals of the jumps, generally leading to 
saddle-point problems even if the original equation is elliptic. 
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2.5 Heuristic Homogenization Schemes 

Finite Difference Time Domain methods [45] of applied electromagnetics re- 
quire enormous computational work due to a large number of time steps for 
numerical wave propagation. Therefore regular orthogonal grids are strongly 
preferred and the need to avoid ‘staircase’ approximations of curved bound- 
aries has been quite acute. To this end, Yu, Dey and Mittra advocate very 
simple homogenization techniques where the equivalent material parameter 
for grid cells containing two materials is obtained by either volume averag- 
ing [18] or edge-length averaging [46]. Numerical experiments have shown 
good results but the method remains heuristic, and its accuracy and possible 
limitations need further theoretical and practical investigation. 

2.6 Finite Integration Techniques 

Another class of methods for applied electromagnetics is Finite Integration 
Techniques (FIT) developed primarily by researchers in Darmstadt [12], [24], 
[35], [36]. Various extensions and enhancements of FIT, such as “the Per- 
fect Boundary Approximation technique’^ and “Conformal FIT”, approxi- 
mate curved material interfaces on regular orthogonal grids. Although a de- 
tailed description of PBA has not yet been published, PBA is apparently 
related to some of the homogenization techniques mentioned in the previous 
subsection. FIT utilizes the integral form of Maxwell’s equations imposed on 
a pair of grids. In recent years, this model has been gaining popularity quite 
rapidly (see e.g. [13], [24], [35], [36]). 

3 A Finite Element Difference Method 

3.1 The General Setup 

For simplicity and clarity, this paper focuses on a model linear elliptic bound- 
ary value problem with homogeneous Dirichlet conditions: 

V • crVu = f in Q C (n = 2,3); = 0 (7) 

Here a is a generic material parameter that can be discontinuous across 
material boundaries and can depend on coordinates but not, in the linear case 
under consideration, on the potential u. The physical meaning of a depends 
on the nature of the problem, and we shall keep in mind electrostatics and 
magnetostatics as practical examples. 

Due to special choices of approximating functions, regular hexahedral 
or rectangular grids will in many cases suffice and will therefore be used 
exclusively in the remainder of the paper - even though the approach applies 
to general FE meshes equally well. Consequently, the domain will be assumed 
hexahedral / rectangular. The computational problem is nontrivial due to the 
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presence of nonconforming material interface boundaries - straight or curved 
- with the standard interface conditions 



Ui — U2 


on F 


(8) 


dui 


9u2 „ 


(9) 


dn 


== CT 2 -^ on F 
on 



where the subscripts refer to the two subdomains i?i and Q 2 sharing the 
material boundary jT, and n is the normal direction to F. A regular mesh is 
obtained by subdividing each edge of Q into a number of subintervals, not 
necessarily equal. Let J?* be a brick element of the mesh, so that i? = Ui7j, 
j = 1, 2, . . . ,m. The energy inner product is defined as 

{u, v)a,u = / crWu-Vvduj in H^{uj) x H^{u) (10) 

J U) 

for any domain a; C i? with a Lipschitz boundary. 

3.2 Special Approximating Functions 

This section reviews the construction of special approximating functions pro- 
posed by Babuska et al. [2] and later used by Moskow et al. [28]. (The con- 
struction is presented here in a somewhat modified form convenient for our 
purposes.) The main observation is that the normal derivative of a poten- 
tial u satisfying the jump conditions (9) can be rendered continuous by an 
appropriate coordinate mapping S: 

S: (ri,T 2 ,n) ^ (fi,f 2 ,n) ; fi ^ n, T 2 = T 2 , n = n/a (11) 

where (ri,T 2 ,n) is an orthogonal system, ti,T 2 being tangential to the (suffi- 
ciently smooth) interface boundary F and n normal to that boundary, with 
n = 0 corresponding to F, Consider two specific but most practical cases: 

Case L A plane interface boundary. Let R be the (constant) orthogonal 
rotation matrix transforming {x,y,z) to (ri,T 2 ,n), i.e. 

(ri,T 2 ,n)^ = R-{x,y,z)'^ 

Coordinate stretching (11) in the n-direction can be associated with the di- 
agonal matrix S = diag(l, 1, a”) and the overall coordinate transformation 
- with a symmetric positive definite matrix 

T = R^SR (12) 

Note that T depends on coordinates through the dependence of 5 on ri , T 2 
and, most importantly, on n. For any function i?, one can define 

the transformed function ^ as 



^(0 = t/;(TC), VC = {x,y,zf 6 R^ 



(13) 
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It is straightforward to verify that if 'll) has continuous derivatives, then 'll; 
satisfies the derivative jump condition (9) at the interface. 



Case 2. A spherical interface boundary with a radius tq. For simplicity, 
let the material parameter a be piecewise-constant and equal to ain inside 
the sphere and Gout outside. Consider the coordinate ‘stretching’ in the radial 
direction for points outside the sphere: 



f r, r < ro 

1 ^0 + CFinl(^out{r - ro), r > ro 



(14) 



If a function '?/;(r, ip) has continuous derivatives, the transformed function 
'il){r^0^(f) = 'il){r^9^ip) satisfies the interface condition (9). In Cartesian co- 
ordinates, 'll) can be written as 



'il){x^'y^z) = il){x,y,z)^ with x = xf/r, y — 'yflr^ z — zrlr (15) 

In a hexahedral cell [xi,X 2 ] x [^ 1 ,^ 2 ] x [^ 1 , 2 : 2 ] containing the spherical in- 
terface, one may choose, for example, a set of eight basis functions trilinear 
with respect to the transformed coordinates x, i: 

^1 == {x 2 - x){'y 2 - y){z 2 - z), etc. (16) 

where the transform of parameters X 2 , ^ 2 , ^2 is theoretically unimportant but 
tends to make the basis better conditioned. The '0’s, being smooth functions 
of X, i, have the necessary derivative jumps across the spherical interface 
in the original coordinates. 

This construction is not affected by the relative size or position of the 
spherical boundary with respect to the grid cell. In particular, the approxi- 
mating functions are applicable even to spherical particles contained entirely 
within one cell - quite a desirable feature for problems with small particles 
[31]. One numerical example of this kind is given in Section 4.3. 



3.3 Variational Formulations 

Over each hexahedral element Qi , one can introduce the approximation space 
Vhi — span{?^j^^}, j = 1, 2, ... 8. A conforming FE method could be obtained 
by partition of unity, as was done in [31] for a different choice of special 
functions 0. GFEM-PU does introduce additional complexity, and a much 
simpler nonconforming method is therefore worth exploring. The FE space 
is defined as 

14 = {vh ‘‘ G Vhu '^h continuous at the nodes of the grid} (17) 

This space is nonconforming because the continuity is imposed only at the 
nodes and not over all interelement boundaries. 14 is a finite-dimensional 
subspace of the broken Sobolev space [29] 

= {v£L2{Q)-. 
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with the energy inner product 

{u^ v)cru = f crVu-Vndi? in x (18) 

jQi 

where the union symbol in the subscripts is indicative of UOi. The energy 
inner product over the whole domain i? is, as before, 

(u,v)^ = [ aVu-VvdQ in H\n) x H^{n) (19) 

Jo 

For any pair of functions that are actually in the cr-product and the 

partitioned crU-product are obviously identical; but for functions discontin- 
uous across interelement boundaries the cr-product is undefined. It is known 
that (u, u)au not only is strictly positive (for nonzero u) and hence defines a 
valid inner product, but also satisfies the discrete Friedrichs inequality with 
a constant independent of the mesh size [19], [23], [9]. The continuous varia- 
tional problem is formulated as 



("U, U^fj — 




u,W e H^{0) 


(20) 


Since (u, u')a- = 


■ (u, u')au in ifo(f^), (20) can also be written as 




(u, u')au = 




u,W e H^{n) 


(21) 


The discrete problem is 






{Uh-j UfP)(y\J 


II 


), € Vft C Hi(U/?i) 


(22) 



Note that there are no interface surface integrals or other penalty factors for 
conformity involved. The main difference with the standard FEM is in the 
energy inner product in the broken Sobolev space - this product is computed 
element-wise due to possible discontinuities of functions across the interele- 
ment boundaries. It is also important that ffy(Ui?i) ^ iJQ(/2) and therefore 
the test function in (22) in general does not lie in the Sobolev space of 
the original continuous problem. 



3.4 Stiffness Matrices and the Discrete System 



Unlike conventional FE nodal bases, the approximating functions ip do not 
generally satisfy the Kronecker-delta conditions at the eight nodes. Let N be 
the matrix of the nodal values 



/ 'tpiiri) thin) . 


■ ■ '0s(n)\ 


•4>i(r2) i>2{r2) ■ 


•• i>8{r2) 


Xtpiirs) i>2{r8) ■ 


■■'ipsirz)) 



( 23 ) 




Generalized Finite Element Difference Methods 



67 



where rj is the position vector of the j-th node. Then for any function Uh = 
where E is a, coefficient vector, the respective vector 
Unodai ^ nodal values of Uh is 

—nodal (^ 4 ) 

Assuming that N is nonsingular, it follows immediately from (24) that each 
column of N~^ constitutes a coefficient vector transforming the {ip^ basis 
into the Kronecker-delta nodal basis If both bases are arranged as 

column matrices ip and (/?, then 

(f = with ipj{rk) = Sjk (25) 

The transform between the Gram matrices of the two bases is 

(26) 

To each node k of the mesh there corresponds, in an obvious way, a basis 

(i) 

function ipk thsit is equal to within each element i adjacent to node k. 
Since no matching conditions between the bases in the element-wise spaces 
have been introduced, (pk will in general be discontinuous across interelement 
boundaries. 

The stiffness matrix corresponding to the left hand side of (22) is formed 
in the usual way. First, one notes a one-to-one correspondence between Eu- 
clidean vectors of nodal values u E and the respective functions Uh G : 

E n 

(27) 

Further, let X : R^ — >■ Vh be the interpolation operator defined by (27). The 
energy inner products in Vh x Vh have their counterparts in R^ x R^: 

{Uh-,Uh)(T\j — '^1=1 '^k^k)aU 

= { LuM ) = (m,m')l ^ ^ 

where u, u' are Euclidean vectors in R^ and L is the stiffness matrix with 
the elements 

Lkl = {^k,^l)aU 

Consequently, the discrete formulation (22) can be rewritten in an equivalent 
form 



(m,m')l = (/, Xu') Vu'(i?" (29) 

The stiffness matrices in the ip- and nodal-bases are related, due to (24), by 
the similarity transform 

Ln = (30) 

where Ljsf stands for the nodal stiffness matrix with the entries {aV<pi^Vp)j) 
and the entries of are 

L'lpij — (cr V ipi , V 'pj ) 



(31) 
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If the material interface boundary is plane, the entries of and conse- 
quently Ljv, can be found analytically. For a curved boundary and/or higher 
order approximation, numerical quadratures may be unavoidable. 

Convergence analysis can be adapted, with some changes, from [28] and is 
outlined in Appendix I. Please see the Discussion section for further remarks. 

4 Numerical Results 

4.1 2D Problem: Cylinder in an External Field 

As the first and simplest illustrative example, let us consider a dielectric 
cylinder placed in an external field (Fig. 4.1). The domain is the unit square 
[0,1] X [0,1], with the inhomogeneous Dirichlet condition u = x imposed on 
its boundary (this condition would correspond to a uniform applied field if the 
domain were much larger). The particular results below are for the material 
parameter a = 10 of the cylinder and for a = 1 outside the cylinder. The 
cylinder axis is at (0.52, 0.49), and its radius is 0.15. 

Fig. 4.1 shows the potential distribution as a function of x along the line y 
= 0.5 for several grids. Even for a very crude 4x4 grid, in no way capable of 
capturing the cylindrical boundary, the computed values of nodal potentials 
are quite accurate. Problems with small particles (e.g. for nanotechnological 




Fig. 1. Field lines for a cylindrical particle (a 2D problem). Good results obtained 
with FED even on a very coarse 4x4 grid shown. 



applications [31]) would require a high level of hp mesh refinement around 
the particles in standard FEM or FD methods. In FED, a coarse grid suf- 
fices because the special approximating functions with jumps represent the 





Matlab/pdetool 
FED, grid 4*4 
FED. grid 8*8 
FED. grid 16*16 



FED. grid 32*32 
FED. arid 64*64 
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behavior of the potential well. A 3D example of this nature is given in the 
following sections. 

4.2 A 3D Problem: Magnetic Particles Near a Magnetized 
Substrate 

This test problem is related to multiparticle assembly processes in nanotech- 
nology and was previously solved using Generalized FEM [31], albeit with 
different approximating functions. As an additional point of comparison, a 
FEMLAB^^^ simulation was also performed. 

Three spherical particles are located near a magnetized substrate. The 
computational domain is a unit cube, with homogeneous Dirichlet condition 
at its surface. The radii of all particles are equal to ro = 0.05; the particles are 
centered at (0.3, 0.65, 0.5), (0.5, 0.65, 0.5) and (0.7, 0.65, 0.5), respectively. 
The relative permeabilities of the particles are assumed to be //r = 10. The 
substrate is a parallelepiped, [0.4, 0.6] x [0.45, 0.5] x [0.4, 0.6], with unit 
magnetization in the x-direction. The magnetic scalar potential distribution 
is illustrated in Fig. 4.1 produced by FEMLAB. 

The results obtained by FED and FEMLAB are in a good agreement (Fig. 
4.2). Note that the grid size in FED is equal to the radius of the particles, i.e. 
not at all sufficient to resolve the geometry of the spherical boundary. This 
is possible because the behavior of the solution at particle boundaries is an- 
alytically incorporated into the FED approximation. The popular but more 
basic schemes that rely on simple averaging of the material parameter in the 
volume of the cell or along its edges (see Section 2.5) cannot produce a com- 
parable result; for example, the volume- averaging scheme yields a smoothed 
potential distribution (Fig. 4.2). 



4.3 3D Particles Smaller Than the Grid Cell 

Special approximating functions representing derivative jumps at interfaces 
make the numerical solution adequate even if fine geometric details are not 
resolved on the grid. It is even feasible to solve problems with particles much 
smaller than a grid cell. For illustration, several problems with a varying 
radius of a spherical particle with cr = 10 in the unit cube were solved on a 
fixed 5x5x5 mesh (i.e. h = 0.2). The particle was placed a little off-center 
of the domain to eliminate possible symmetry artifacts. To make an exact 
solution available for validation, Dirichlet boundary conditions were set to 
corresponded to the well-known potential of a sphere in a uniform field (the 
induced potential of a sphere has only the dipole component). 

For the radius of the particle ro = 0.05, i.e. four times smaller than 
the mesh size, the error in the nodal values of the potential in the cross- 
section z = 0.5 is plotted in Fig. 4.3a. For comparison, a similar plot of the 

^ FEMLAB is a registered trademark of Comsol, Inc. 
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Spatial mapping vs. voiume averaging; 
potentiai along the line through particle centers 
(y=0.65, z=0.5) 




spatial 
mapping 
“ • - volume 
averaging 
FEMLAB 



Fig. 4. Three spherical particles near a magnetized island: potential as a function 
of a: at y = 0.65, = 0.5. 



potential obtained by volume-averaging of the material parameter is shown 
in Fig. 4.3b. The error for volume-averaging is an order of magnitude higher 
than for FED (note the scaling factors in the figures); this is not surprising 
because for volume- averaging the solution obviously does not even depend 
on the position of the particle within the cell. 

5 Future Development: the Framework of Multivalued 
Approximation 

The method described above provides very reasonable accuracy even for fairly 
coarse regular grids that do not have to resolve the geometric details. Never- 
theless the approach has two main drawbacks. First, as in GFEM, the com- 
putation of matrix entries in (31) requires numerical quadratures over 3D 
regions of complex shape (such as, for example, the intersection of a spher- 
ical particle with a hexahedral grid cell). Second, the optimal convergence 
rate with respect to the mesh size cannot generally be expected. 

Of these two drawbacks, the second one is alleviated by the fact that 
we are interested in obtaining reasonable accuracy on relatively coarse grids, 
when the asymptotic rate may not be relevant. The first drawback is more 
serious, as numerical quadratutes in 3D carry a large computaitonal overhead. 
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Fig. 5. Numerical errors for the potential of a spherical particle in the unit cube 
(plane z = 0.5). Left: volume averaging, scaling factor 10~^. Right: FED with a 
spatial mapping, scaling factor 10“'^. 



We now briefly describe a new framework that eliminates the unpleas- 
ant volume integration while preserving the best features of the previous 
approach. The main ideas are listed below, and the details will be given in 
[43]. 

A New Framework: Multivalued Approximaiton 

- A system of overlapping patches is introduced as before. The nodal val- 
ues on the grid are single- valued and provide the necessary ‘'information 
transfer” between the overlapping patches. The transformation between 
the nodal values the numerical solution and its expansion in terms of the 
approximating functions is given by the same matrix N as before (23). 

- Any relevant approximating functions can be used within each patch, 
independently of other patches. (Examples: singular Coulomb potentials; 
functions generated by spatial mappings to satisfy the jump conditions 
on material interfaces.) 

- Simple regular grids can be used. 

- When patches overlap, the approximation is generally multivalued. Not 
only is this not problematic, as long as all approximations converge to 

^ the exact solution, but can be turned into an advantage: the discrepancy 
between multiple numerical values available in the intersection of patches 
may serve as an a posteriori error estimator. (The well-known Bank- 
Weiser error indicator based on the jurhps of normal derivatives across 
element boundaries is a direct analogy.) 

- Since a unique globally continuous interpolant is not defined, the Galerkin 

method in is generally not applicable. However, within each patch 

there is a sufficiently smooth local approximation (3) , and a general mo- 
ment (weighted residual) method can be applied, provided that the sup- 
port of the test function is contained entirely within the patch. 
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— In particular, by introducing the standard ‘control volume’ centered at a 
given node of the grid and setting the test function equal to one within 
that control volume and zero elsewhere, one arrives at a flux balance 
scheme. This is a generalization of the standard ‘control volume’ tech- 
nique to any set of suitably defined local approximating functions. Only 
surface intergals, rather than volume quadratures, are needed, which 
greatly reduces the computational overhead. 

6 Discussion 

Inhomogeneous static problems can be solved either on geometrically con- 
forming meshes, possibly with hp-refinement that has now become conven- 
tional in FE methodology, or, alternatively, on regular meshes with additional 
approximating functions representing derivative jumps on material interfaces. 
The trade-off is between the complexity of mesh generation and adaption on 
the one hand, and the burden of numerical quadratures on the other. 

The paper reviewed the most promising ways to construct “Finite Ele- 
ment Difference” algorithms that are based on variational FE principles but 
produce FD-like schemes on regular grids. One such algorithm was devel- 
oped as a generalization of the variational-difference homogenization scheme 
proposed in [28]. Analysis then revealed that this algorithm is equivalent to 
a nonconforming FE method in broken Sobolev spaces on regular grids. Al- 
though the convergence rate of this method in the energy norm is suboptimal 
with respect to h, practical simulation results are quite reasonable because 
in many cases, and in particular for problems with small particles, the mesh 
size is far from the asymptotic regime. 

An even more promising alternative is the new framework of multivalud 
approximation that has been briefly described in the previous section and 
will be further explored elsewhere (e.g. [43]). The most interesting application 
areas for this class of schemes include macromolecular modeling in solvents 
[4], [16], where the interface boundaries are geometrically extremely complex 
[4] but the solute region admits a rather simple analytical definition as a 
combination of spheres. The proposed approach provides a way to solve this 
type of problems accurately on relatively coarse regular grids. 

In the context of purely finite difference methods, a similar approach 
gives rise, for example, to numerical models of point charge singularities in 
electrostatics with virtually no loss of accuracy and no extra computational 
cost, as explained in [43]. This type of problems is critical for molecular 
dynamics simulations of macromolecules [4], [16], [21]. 

Acknowledgment 

My communication with the research group from Technische Universitat 
Darmstadt was a major stimulating factor for pursuing the line of research re- 
flected in this paper. I would like to thank Markus Clemens, Rolf Schuhmann 




74 



Igor Tsukerman 



and Thomas Weiland for very informative discussions. I am also grateful to 
Prof. Susanne Brenner of the University of South Carolina for answering my 
questions. 



Appendix I. Convergence 

Let u* be the exact solution of the continuous problem (20) or (21) and 
Uh ^ Vh be the solution to (22). We assume that u* belongs to a space 
W{f2) C Hl{0) of potentials u that admit an approximation Au G Vh such 
that 



\\u-Au\\au < c\\u\\wh^ (32) 

II u — II < c\\u\\wh^, ueW{0) (33) 

where the orders of approximation p > 0, ^ > 0 and the L 2 -norm appears 
without a subscript. In general terms, W comprises functions that are ren- 
dered sufficiently smooth by the spatial mapping of Section 3.2. An elaborate 
mathematical analysis of approximation (32) in 2D is presented in [2], and 
its technical details are quite involved. To avoid these mathematical compli- 
cations, we shall treat estimate (32) as an additional assumption. 

Even though we have not assumed a special grid construction or a spe- 
cific approximation of gradients as in Moskow et al [28], weak convergence 
analysis can be adapted, with some changes, from [28]. Beware that in the 
transformations below a— and a U —products are used interchangeably for 
functions in Hq{Q). 

For any function u' G VF(I7), consider the energy inner product 

{Au* - Uh, Au')au {Au*, Au')aU - {Uh, Au')a^ . . 

= {{Au*, Au^)aU - {u*, u')au} + {{u*,u')a ~ {Uh,Au')au} ••• 

[the terms in the second curly brackets correspond to the discrete and con- 
tinuous variational problems (20, 22)] 

... - {{Au*,Au%u - {u*,u%u} + {(/,u') - {f,Au')} (35) 

Due to the approximation property of A for both u* and u', i.e. u* ^ Au* 
and u' ^ Au' in the sense of (32), the last expression (33) can be rewritten 
as the following estimate of weak convergence: 

I {Au* - Uh, Au)au I < [c{u*)h^ 4- c{u) II / II /i^] (II u* \\w + II U ||w)(36) 

A more detailed analysis shows that the convergence rate in the energy 
norm is suboptimal. 
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Abstract. In refined network analysis, a compact network model is combined with 
drift-diffusion models for the semiconductor devices which are part of the network, 
in a multiphysics approach. For linear RLC networks containing diodes as dis- 
tributed devices, we construct a mathematical model that combines the differential- 
algebraic network equations of the circuit with elliptic boundary value problems 
modelling the diodes. For this mixed initial-boundary value problem of partial 
differential- algebraic equations a first existence result is given, based on a non- 
standard application of Schauder’s fixed point theorem. 



1 Introduction 

In semiconductor technology, each device is part of a more complex electrical 
circuit. The interaction between the single device and the whole circuit can 
be neglected whenever the scale of the device is larger than the scale of inter- 
connects. As spatial dimensions of semiconductor technology shrink steadily, 
accurate models of device/interconnect interactions are needed [1,3]. 

In this article, we concentrate on RLC networks with bipolar semiconduc- 
tor devices, such as diodes. Due to the very different time scales related to the 
relaxation of diodes to equilibrium and to the electric current in the network, 
it is appropriate to model the devices by stationary drift- diffusion equations 
[7,8]. For the sake of simplicity, we derive appropriate coupling conditions for 
one-dimensional diodes and linear RLC networks, set up by Modified Nodal 
Analysis (MNA) [5,9]. This multiphysics approach yields a coupled system of 
(elliptic) partial differential equations (PDEs) and differential-algebraic equa- 
tions (DAEs), for short, a system of partial differential- algebraic equations 
(PDAEs): the node potentials of the network define boundary conditions for 
the diode model and this causes each diode to produce a current flow, such 
that each diode acts as a voltage-defined current source for the electric net- 
work. 

This paper is mainly devoted to a short presentation of an existence result 
for this PDAE. Further details can be found in [2]. The issue of uniqueness is 
more delicate and will be addressed in a forthcoming paper. We recall that 
non-uniqueness is a general feature of drift-diffusion equations, for input data 
far from equilibrium (see [6-8,10] and the references therein). Nevertheless, 
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from a physical point of view, we expect uniqueness when the circuit equa- 
tions enter into play. At the moment it does not seem conceivable to prove 
such a result for general data, but only when the boundary conditions for the 
diodes remain close to equilibrium, following the analysis of the stationary 
drift-diffusion model [7]. 

To the author’s knowledge, a mathematical analysis of coupled systems 
of DAEs and PDEs has been performed only for linear PDEs. Specifically, 
an existence result has been obtained for an RLC networks containing uni- 
form lossy transmission lines [3,4], which are described by the telegrapher’s 
equations. This model leads to a system of linear hyperbolic PDAEs. 

The paper is organized as follows. In Section 2 we present the model, 
which will be analyzed in the subsequent section. Finally, we draw some 
conclusions and discuss several open problems. 



2 Network Models for Electrical Circuits 



2.1 Electrical RLC Networks 



An RLC network is an electrical circuit whose basic components are resistors, 
inductors and capacitors. In addition, we assume that the network contains 
semiconductor devices. Using Modified Nodal Analysis (MNA), such an RLC 
network is described by certain given capacitance, inductance and conduc- 
tance matrices, C G L G and G G which are 

positive-definite and symmetric. The indices nc, and no denote the num- 
ber of capacitors, inductors and resistors, respectively. 

If the network has n nodes, the unknowns to be determined are the node 
potentials u G , the currents through inductive branches G , and 
the currents through branches with voltage sources jy G E^'^ . The input 
data of the network are the independent voltage sources v G E’^'^ and the 
independent current sources i G E^^ . The currents through the Ohmic con- 
tacts of the semiconductor devices, which depend on the applied potentials, 
are A G , where Ud is the number of devices. Finally, the network topol- 
ogy is described by the incidence matrices Ac G Al G , 

Ar G , Ay G , A/ G , and Aa G , for capacitors, 

inductors, resistors, voltage sources, current sources and Ohmic contacts of 
devices, respectively. 

Using Kirchhoff’s laws, it is possible to derive the network equations for 
the circuit part. 



AcCA^— + Ai^GA^u + AlJl + ^vJv + -^aA -h Aji{t) = 0, 

_ Alu = 0, 

dt ^ 

— AyU + v{t) = 0. 



( 1 ) 
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The unknowns of the RLC network equations (1) can be decomposed into 
differential and algebraic components, after introducing the projector Qc 
onto the kernel of A^, and the orthogonal projector Pc = I ~ Qc- Then, 

the differential component is given by y = (yi,y 2 )^ JlV : and 

the algebraic component by z = (zi,Z 2 )^ (Q( 7 U,jy)^. The RLC network 

equations can be rewritten as 

dtv 

A— + Bpy + Cpz + Fp(A, i(t)) — 0, (2) 

at 

Bqz + Cgy + Fq(A, i{t), v{t)) = 0, 



where 




ATQc 0 j’ 



Yp — _ I^Q^AaA + Q^A/tj 

The matrix A is positive-definite and symmetric, since both H = AcCA^ -h 
QcQc' and L are positive-definite, symmetric matrices. The system (2) is 
supplemented with initial data y(^o) == Yo- 
The topological conditions 



ker(Ac, Ap, Ay)~^ 


= { 0 }, 


( 3 ) 


kerQj.Ay = {0} 




( 4 ) 


kerP^AA = {0}, 


(Q^Aa = 0). 


( 5 ) 



imply that the matrix Bg is invertible, and z can be expressed as a function 
of y, so that the system (2) has algebraic index 1. The topological conditions 
(3)-(5) are physically reasonable. In particular, condition (5) requires that 
any device’s terminal is connected to ground through a path of capacitors. 



2.2 Semiconductor devices and electrical networks 

In this subsection we deal with the device equations and with their coupling 
to the network equations. We assume that an electrical circuit contains 
bipolar devices. We model the a-th device as a segment of length /, charac- 
terized by a doping profile Na{x), x G (0,/). Then, each device is described 
by the drift-diffusion equations 

^x {QP'pPa^x4^p,a} — (b) 

-dx {edxVa) = q{Na + Pa ~ ria), 

where (j)n,a^ <Pp,a are the quasi-Fermi potentials, Va is the electric potential, 
and Ua, Pa are the electron and hole densities. The electron and hole mobilities 
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IJip are assumed to be given functions of the densities, the electric field, 
and the space variable. The generation-recombination term, Ra, is modelled 
by 

Ra{n,p,V) = Fa{n,p,V)('^-\ 

This expression includes both Shockley-Read-Hall and Auger generation- 
recombination terms [6-8,10]. 

In (6), the densities Ua and pa will be expressed in terms of the quasi- 
Fermi potentials and of the electric potentials Va, by 



ria 



rii exp 



ha 0n,a 

Ih 



Pa = rii exp - 



ha 4^p,a 
~Vt 



where Ut = ksT/q is the thermal potential. In particular, the generation- 
recombination term becomes 

Ra = Ra{4>n,4>p,V) = Fa{4>n, 4>p,V) (^exp - l) ■ 

The end points of each device are Ohmic contacts, for which the following 
conditions have to be imposed [7,8], 

</>n,a(0, t) = </)p,a(0, t) = Va(0, t) = Vbi,a(0) + (7) 

^n,aiht) = 4>p,a{h*) = '“2,a(^), Va[l,t) = Vbi.aCO + «2,a(i), (8) 

where ui^aj ^ 2 , a are the applied potentials, and the built-in potential Vhi^a is 
defined by 



hbi,a = Urlog 



2n 




+ 1 



The applied voltages ui = and U 2 = ('^^ 2 , 1 , • • • are 

part of the unknowns of the network equations. Using the incidence matrix 
A\, we can write the following coupling condition to the electrical network. 



f 

\Mt) 



ATu. 



(9) 



Finally, the electrical currents Ai = (Ai,i , . . . , and A 2 = (A 2 ,i , • • • , 

A 2 ,nd)^ at the Ohmic contacts are explicitly given by 

— in,a(9?^)'F jp,a{^Tt)^ ~ 3p,ai}T^\ (19) 

Jn,a — QfJ'7i'^a^x4^n,a^ jp,a — Ql^pPa^x4^p,a' 



The plus and minus sign in the definition of Xi^a and A 2 ,a take care of the 
incoming directions to the device. For later reference, it is convenient to 
introduce the vector A = (Ai, A 2 )^. 
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3 Existence of solutions 

In this section, we prove an existence result for the coupled system of the 
network equations and the drift-diffusion equations. The main difficulty re- 
sides in the term A, which is a nonlinear, differential functional of A^u. In 
general, this functional cannot be expected to be continuous, due to the lack 
of uniqueness of the drift-diffusion equations [7]. Therefore, the existence of 
a solution to the DAE describing the electrical network is not ensured. 

For simplicity, we assume that the circuit contains only one device. All 
the results of this section can be easily extended to arbitrary Ud devices. 

Before stating the main result (Theorem 2), Lemma 1 collects some a 
priori estimates which will be needed later. 

Lemma 1. Let the given electric network provide symmetric, positive defi- 
nite matrices C, L, G, and let the network’s incidence matrices satisfy the 
topological conditions (3)-(5). Furthermore, let (y,z) G {C{[to,ti]))^~^^^^^^ 
be a solution to the corresponding network equation (2), with initial value yo- 
We assume that t G [L‘^{[to,ti]))^\ v G [L^{[to,ti]))^^ , and that X satisfies 
the condition 

{AlufX > 0. (11) 

Then, for all t G the solution satisfies the estimates 

\yf{t) < (|yoP + lhll(L2(po,ti]))-/ + lbll(L 2 ([<o,ti]))”v) ? (12) 

\z\\t) < C4\yf{t) + \zf{t) + \v\^{t)) , (13) 

for some positive constants Cy, Cz and ci. 

Proof, The lemma follows from equation (2), after multiplying by y^, using 
Schwarz inequality and Gronwall’s lemma (see [2]). □ 

We remark that no assumption is made on the functional structure of A. 
It is possible to check that the natural power condition (11) is satisfied by 
A == (Ai, A 2 )^, defined by (10). 

Guided by Lemma 1, we introduce the Banach space 

X = AxAxB, A==C{[to,ti],LmO,l])), B = C ([to,ii], ) , 

and the subset M C X containing all ^ X which satisfy for all 

X G [0,/], t G [^Oj^i] inequalities (12) and 

Ui{t) A U 2 {t) < (j}n{x,t),(l)p{x,t) < Ui{t) V U 2 {t), (14) 

and the coupling conditions (9), with y = (yi,y 2 )^, Y2(0 ^ - In (14), 

the composition of two functions by the symbols A and V denotes their min- 
imum and maximum function, respectively. It is not hard to see that M is a 
bounded, convex subset of X. We are ready to state the main result of this 
paper. The remainder of this section is devoted to its proof. 
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Theorem 2. Existence. Let the source functions i and v he continuous , the 
network matrices he symmetric, positive definite and the topological conditions 
(3)-(5) he fulfilled. Then there exists a solution in M to the coupled system 
(2), (6), (7), (8), (9). 



Proof. The proof is based on Schauder’s fixed point theorem and requires 
the next Lemmas 3 and 4. In order to define an iteration operator T which 
maps M into itself, we consider a triplet (0^, 0^, y') G M. The operator will 
be defined in two steps. First we consider the following problem, 

^dlV = n{V,<t>'J-p{VAp)-N, 

y(o,o = Cbi(o) + «;(•), y(/,-) = Vbi(0 + 4(0, (is) 




We denote by V' the solution to (15) (see Lemma 3). Next, we consider the 
following linearized drift-diffusion equations, 

dx {p'nn' dx<t>n) = -F' ■ ^exp ~ 1^ > 

dx [p'pP' dx<i>p) = F' ■ (^exp - 1^ . (16) 

*) — ') — ') = 0p(^7 ■) — 

where the prime denotes evaluation at (0^, V). The system (16) is coupled 

to the DAE 

d\ ~ 

Ap— + Bpy + Cpz + Fp{i{t), A) = 0, 

BQZ + CQy + FQ(2(i),w(i)) = 0, (17) 

y(*o) = y'(^o), 



by the coupling conditions 



{ _ aT^ X - ( ~ dx<Pn - qp'pP' dx(pp 

\U2J \ qiJ.'„n'dx<Pn + qPpP'dx<pp 



(18) 



We denote by (0”,0p^y") the solution to (16)-(18) (see Lemma 4). In this 
way, we have constructed a map T(0(j, </)p,y') = (0",0",y"). Using Lemmas 
3 and 4 below, it is immediate to see that T is continuous and maps M into 
itself. In order to apply Schauder’s fixed point theorem, we need to show 
that T{M) is relatively compact in M C X. We introduce the decomposition 
T{M) — Ml X M 2 , with 

Ml = |(0",0;') e C ([to,ti], (i'([0, /]))') I (0",0;',y") G T(M)} , 

M2 = {y" e C ([to, hi K”*- ) I (</.", 0", y") e T(M)} . 
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Recalling the estimate (12), and using the network equations to express 
in term of y” , it is immediate to see that y" belongs to a bounded subset of 
([^0 5 )• Therefore, M 2 is relatively compact in C ([^o, 

Next, the Rellich-Kondrachov theorem implies that the set 









w(x, t), w G 




is relatively compact in L^([0,/])^, for all t G [^o^^i]- Since is Lips- 

chitzian with respect to (ui,U 2 ), for any t G [^o,^i] and for any e > 0 there 
exist 6{e,t) > 0 such that 



max ||w(-,t) - w(-,s)|li2 < e 
L^o4i j 

for all s G [^ 0 ,^ 1 ] with |^ — < S{e,t), Then, by invoking Arzela-Ascoli 

theorem. Mi is relatively compact in C ([^o, ^i], T^([0, /])^). 

Finally, after applying Tichonov’s theorem, we can conclude that T{M) — 
Ml X M 2 is relatively compact in X, and Schauder’s theorem implies the 
existence of a fixed point of the map T. □ 



As we have mentioned, the proof of Theorem 2 resides on the following two 
lemmas. 



Lemma 3. and (f)^ satisfy the inequality (I 4 ), the problem (15) has for 

any time t G [to7^i] a unique solution V = in f^^([0,/]). This solution 

satisfies the estimate 

inf Td>i + ui{t) A U 2 (t) < V\x,t) < sup Vbi + ui{t) V U 2 {t). (19) 

[ 04 ] [ 0 ,/] 

The proof of this lemma can be found in [7] and [8]. 

Lemma 4. The problem (16)-(18) has a unique solution 

{4>n,4>p,y) = 

in C ([^o,^l],-f^^([0,/]))^ X (7 ([< 0 , ^i], ffi”” )• Moreover, the solution satisfies 
the estimates (12), (I 4 ) and, therefore, belongs to M. 

Proof. We give a sketch of the proof. The details can be found in [2]. First, it 
is possible to see that the quasilinear problem (16) is solvable for any bound- 
ary data ui, U 2 , and that any solution satisfies (14). This statement can be 
assessed by a standard application of Schauder’s fixed point theorem, follow- 
ing the guideline of Lemma 3. Next, it can be shown that both the solution 
and the current A are Lipschitz continuous with respect to the boundary 
data. The Lipschitz continuity of the solution implies uniqueness and well- 
posedness of the problem (16). Then, the solution is continuous with respect 
to time. The Lipschitz continuity of A implies that the problem (17) has 
a unique solution, as implied by Picard-Lindel5f theorem. Finally, recalling 
Lemma 1, the solution satisfies (12), since it is possible to verify the power 
condition (A]^u)^A >0. □ 
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4 Open problems 

We have proposed an existence result for a system of elliptic PDEs, describ- 
ing a set of semiconductor devices, nonlinearly coupled to a system of DAEs, 
describing a linear RLC network. The proof is based on a fixed point map, 
defined in an appropriate function space. In order to translate the idea of 
the proof into an efficient numerical scheme, we need to define a contractive 
iteration map, rather than a generic fixed point map. Therefore, the issue of 
uniqueness of the solution becomes particularly relevant, and it is strictly re- 
lated to the stability of any scheme based on such ideas. This is still an open 
problem. A second problem concerns the extension of the existence proof to 
evolutionary drift-diffusion and, possibly, hydrodynamical models for semi- 
conductor devices, involving time derivatives too. A third problem is related 
to the inclusion of thermal effects. Finally, we plan to extend the theory 
presented in these pages to multidimensional devices (rather than the strip 
model used here) coupled to circuits, and to similar coupling problems arising 
when parts of a complex system are replaced by spatially lower dimensional 
averaged models (system simulation, mixed-level simulation, etc.). 
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Abstract. Electric circuits designers are frequently interested in the transient be- 
haviour of the designed circuit. A common method for time integration of the Differ- 
ential Algebraic circuit Equations (DAE) is the Backward Differentiation Formula 
(BDF) method. In 1983, J. Cash proposed the Modified Extended BDF (MEBDF) 
method, which combines better stability properties and higher order of convergence 
than BDF, but requires more computations per step. We prove reduction of conver- 
gence order for MEBDF when applied to DAE’s with higher DAE-index. However, 
because in practice, in circuit analysis, the DAE-index does not exceed 2, the reduc- 
tion is quite moderate and it equals the BDF-order in that case. One gains better, 
or even unconditional, stability. One also obtains consistent solutions. 



1 Introduction 

In Circuit Design, Transient Analysis is most heavily used. There is constant 
interest in methods that offer better performance with respect to robustness 
as well as to reduction of CPU-time. Because the underlying circuit equa- 
tions are Differential- Algebraic Equations (DAE), a point for robustness is 
how well the time integrator behaves when problems of higher DAE-index 
have to be treated: does reduction of order of convergence happen, and does 
one obtain consistent solutions. Other points of interest are stability condi- 
tions (see [8]), and the damping behavior along the imaginary axis. 
Backward Differentiation Formula (BDF) Methods do not suffer from reduc- 
tion of order of convergence and generate consistent solutions [1,9,12], which 
made them very popular for circuit simulation, however at the cost of be- 
ing conditionally stable when the order exceeds 2. Improvements were looked 
for in combining BDF with Trapezoidal Rule [10] (less damping), or in new 
methods, like Implicit Runge-Kutta methods like Radau-methods [11] (3rd 
order L-stable Implicit Runge-Kutta methods with options for parallelism) , or 
CHORAL (embedded method of order (2)3, stiffly accurate and L-stable) [7]. 
In this paper, we focus on Modified Extended BDF (MEBDF) methods [3] 
and we report on newly proven convergence order reduction when applied 
to DAE’s of higher index and on the generation of consistent solutions. The 
MEBDF-methods offer better stability conditions than BDF. For instance, 
when applied to index- 1 DAE’s, the 3-step method has order 4 and is A-stable 
(see [9]). When applied to index-2 DAE’s, the order of convergence reduces 
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to 3, but the method’s stability property remains, which compares favorably 
to the 3-step BDF method. The MEBDF-methods easily fit a datastructure 
that was designed for an implementation of BDF-methods. Variants also al- 
low parallelism [6]. 

We start with an explanation of the MEBDF method. After that we formu- 
late two new theorems on the order of the MEBDF method. We continue 
with some numerical results of an example circuit and present results for a 
general test problem. 



2 Modified Extended BDF 

One timestep with the MEBDF method consists of three BDF steps and 
an evaluation step. This results in more work compared to BDF, but the 
order of convergence increases with one for most circuits. This implies that 
for convergence order 3 we normally apply the 3-step BDF method, while 
with the MEBDF method a 2-step method suffices. Well-posed circuits lead 
to an index-1 DAE. For this type of DAE’s the k-step MEBDF has order 
A: -f 1, while the k-step BDF has order k (see Section 3 and [12]). Checking 
the circuit topology enables one roughly to distinguish between index- 1 and 
index-2 problems [12]. Circuits that contain controlled elements require a 
much deeper analysis [5]. Several numerical methods suffer from convergence 
order reduction when applied to DAE’s with higher index. For this reason, one 
is interested in time integration methods of which the order of convergence is 
robust with respect to the DAE-index quantity. A second property that one 
likes to see fulfilled is that the numerical solution satisfies the consistency 
property for DAE’s: if one starts on the manifold defined by the algebraic 
constraints one wants to stay on it. An appreciated property would be that 
the method is able to generate a consistent solution even if one starts from 
an inconsistent initial value. 

We consider the following quasi-linear DAE 

^x'(i) +g(x(i),i) = 0, (1) 

where x(A) G D C G / C IR and A is a constant m x m matrix. 

MEBDF is defined by the following steps for integrating one timestep^ . 

• First BDF step: Solve for at ti by the A:-step BDF-method 

k 

AY^aiXe-i = -hPkSixeAi)- 

i=0 

^ For occurring coefficients do, . . . d^, A, oo, . . . , /^a:, /^ fc+i see [2,3] 

^ is to denote that it is an approximation on time ” is to denote that it is a 
BDF solution; is to denote that it is computed in the {£ — A:)-th step. 
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• Second BDF step: Compute an “extrapolated” solution at by BDF 

k 

= -hPkg{^e+i,tm), 

i=0 

where X£ = This solution is called xf~.^^K 

• Evaluation step: Compute 

Se = g^+i = 

• MEBDF step: Solve for a new solution at tf. 

k 

A^aiXi-.i = -hpkg[-^iAi) - HPk - Pk)gi - hpk+igi+i‘ 

2=0 

For solving all equations we use a quasi-Newton method. This means that we 
don’t compute a new Jacobian every iteration, but apply the same Jacobian 
as for the A:-step BDF during the several iteration steps. 



3 Theoretical Results 

In this section we prove new results on the order of the MEBDF applied to 
DAE’s. Here we restrict ourselves to a constant stepsize h, i.e. — ti + h 
for all i. For t > kj the numerical solution {xi} of the A:-step MEBDF applied 
to (1), satisfies 

1 ^ 

H + 0kg{^(, te) + iPk - Pk)gi + Pk+ige+i = S(. (2) 

2=0 

Here, 6i describes the perturbations in the ^-th step for i > k, which are in- 
troduced by numerical computations including the errors arising from solving 
the nonlinear equations. Moreover, we introduce 

exact solution of (1) 

X£ := X£ - x^{ti), £>0, 

1 ^ A 

~ + hg{^*{te),t() + f}k+ig{^*{tt+i),ti+i), £>k. 

^ 2=0 

Here ri represents the local discretization error of the A:-step MEBDF in the 
£-th step {£ > k). Now (2) may be written as 
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where 

h(x,<) := Pk [g(x + x.(f),i) - g(x,(i),i) - gx(x*(<), i)x] , 
vi := [^k - Pk) [g{x*iU),ti) - g^] , 
i>t := pk+i [g(x4f^+i),ff+i) - g^_^i] . 



3.1 Index- 1 DAE 

In this section, we assume that (1) is an index-1 DAE. This implies that the 
pair {A,Bi} is index-1 tractable [12]. Let Q be a projector onto Ker[A] and 
P I — Q. We define Gu Ap BiQ and 6^ by 

? f QG^l £ > k 

^ * \ QG^^h{5ci,t£) + Q^£ + QG^lBiPSci, i <k 

di represents the defect in the algebraic part for i> k. For the starting values, 
the corresponding defects are also described by Si. 

Theorem 1 . Suppose a constant Gi > 0 exists such that the starting values 
satisfy the relation 

||Fx£ - Px*(^^)ll < £ <k, (4) 

and suppose constants C 2 and C 3 exist such that 

ll<5dl < C 2 /i^ P^ll <C3h'^+\ i>k, (5) 

Then a constant C > 0 exists such that 

\\x:,{te) - xeW < , £>0. (6) 

Outline of the proof (for full details see [2]): We use the projectors P 
and Q to split the DAE into an algebraic part and a differential part. Next, 
we show the existence of a solution of the algebraic part. We use the inverse 
function theorem to show that this solution is locally unique. Substituting this 
solution into the differential part of the DAE results in an equation involving 
only index- 1 variables. This equation can also be solved uniquely with the 
inverse function theorem. By applying the assumptions of the theorem we 
get the final result. 

Remarks. Most well-posed circuits lead to an index-1 DAE [12]. For this 
type of DAE we conclude that the A^-step MEBDF has order k+1 (cf. (6)), 
while the A:-step BDF has order k. Note that the A:-step MEBDF-methods 
are A-stable for Ar < 3, while for BDF this is restricted to the case A; < 2 [3]. 
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3.2 Index-2 DAE 

Next, we assume that (1) is an index-2 DAE. Let Qn be a projector onto 
Kei[Gu] and = I - Q^. 

Theorem 2. Suppose a constant Ci > 0 exists such that the starting values 
satisfy the relation 

\\PPuM - PPuMU)\\ < I < k 

and suppose constants C2 and C3 exits such that 

m\<c2h'^, i>k, ( 7 ) 

\m<C3h^+\ £>0. (8) 

Then a constant C > 0 exists such that 

\\x^{te) -X(\\<Ch'‘, £>k. (9) 

Outline of the proof (again, for full details see [2]). The structure of the 
proof of this theorem is similar to the proof of the index- 1 theorem, but more 
lengthly. With the use of projectors, the DAE is split into an algebraic, an 
index- 1 and an index-2 part. We first solve the algebraic equations and use 
the solution for solving the other parts [2]. 

Remarks. Circuits which contain L-I cutsets or C-V loops lead to an index- 
2 DAE [12]. For this case, we proved that order reduction for MEBDF occurs 
(which does not happen for BDF [12]). However the reduction for MEBDF is 
moderate: the k-step MEBDF and the Ar-step BDF, applied to index-2 DAE’s, 
have the same order, namely k. However, MEBDF has much better stability 
properties than BDF [3]. 

4 Numerical Tests 

We performed some tests in Matlab with a variable step size fixed order BDF 
and MEBDF method. We made Work-Precision Diagrams, to compare the 
work needed for both methods. For more details we refer to [2]. 




Fig. 1. Rectifier Circuit 
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4.1 Rectifier Circuit 

Fig. 1 shows a rectifier circuit that serves the AC to DC conversion. It is 
designed in such a way that it damps the incoming sine- wave. We computed 
a reference solution t = 0.16 using the BDF method with a very 

small time step. In Fig. 2(a) we see a Work-Precision Diagram in which we 
compare the 1-step MEBDF method with the 2-step BDF method. We see 
that the 1-step MEBDF method even performs slightly better than the 2- 
step BDF. Fig. 2(b) presents a comparison for the 3-step BDF and the 2-step 
MEBDF, in which the BDF method performs better. Clearly the methods are 
comparable in these cases, however the MEBDF has the potential of better 
stability. 




Fig. 2. Work-Precision Diagram for second order (a) and third order (b) methods 



4.2 Oscillatory Test Problem 

To show how the MEBDF deals with oscillatory problems, we consider an 
ODE example problem of the form 

x' = y h{r)x^ (10) 

y' = -x + h{r)y, (11) 

where r — \/x^ +2/^- We assume that 0<a;o<l,0<2/o<l- The equations 
can be re-written in the following form: 

r' = h{r)r^ 

e’ = 1 . 

For h{r) — r — 1 is the only stable limit, lim^_^oo ^(0 — 1- ^PPly 
the BDF and MEBDF method to equations (lO)-(ll), which results in ap- 
proximate solutions (xn, Vn) (n = 0, 1, . . .). We compute + and 
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plot this as function of t. In Fig. 3(a) we did this for the 1-step MEBDF 
and the 2-step BDF. We see that the MEBDF solution is less damped than 
the BDF solution of the same order. In Fig.3(b) we see that the 3-step BDF 
does not perform very well on this problem. This is due to the fact that the 
3-step BDF is not A-stable and for oscillatory problems only A-stable meth- 
ods perform well. Clearly, the MEBDF method is more suited for oscillatory 



Oscillatory test problem Oscillatory test problem 




Fig. 3. Plot of Tn = y/xn + Vn by second order (a) and third order (b) methods. 



problems than BDF. Here one can also fully exploit the better stability and 
convergence order properties of MEBDF. 



5 Consistent Solutions 

BDF generates consistent solutions, even if one starts with not consistent 
initial values [2,12]. Because BDF serves in predicting for the extrapolation 
point, MEBDF inherits this property from BDF. This is in contrast to for 
instance solutions obtained by using the Trapezoidal Rule. This property is 
important when visualizing results for problems with discontinuities. It is 
known that computing consistent initial values from the DC operating point 
only requires solving an additional linear system [5]. Another way of getting 
consistent initial values is integrating forward, till there is no influence of the 
(inconsistent) initial values and then integrating backward. 

6 Conclusion 

MEBDF methods oflfer an alternative to BDF methods in the case of DAE’s 
with index not exceeding 2. The MEBDF methods combine higher conver- 
gence order with better stability properties and better numerical damping: 
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Hence, more rigorous benchmark tests for circuits occurring in practice are 
worth to be done. 

Finally, we gratefully acknowledge discussions, including assistance in pro- 
viding implementation, with Jeff Cash (Imperial College, London), Jason 
Frank (CWI, Amsterdam) and Caren Tischendorf (Von Humboldt Univer- 
sitat, Berlin). 
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Abstract. In this paper we present a theoretical foundation for tail electron hy- 
drodynamical models (TEHM) in semiconductors with application to bulk silicon. 



1 Introduction 



The description of the functioning of modern electron devices requires in- 
creasingly accurate physical models of carrier transport in semiconductors in 
order to deal with high-held phenomena such as impact ionization, thermal 
self-heating, etc. High-energy electron phenomena are of particular interest 
in the accurate evaluation of the degradation and breakdown of devices. 
Monte Carlo methods (MC) are extremely CPU intensive and therefore not 
practical for routine design applications. Traditional hydrodynamical models 
for carrier transport in semiconductors cannot describe high energy electrons 
since they deal only with average values over the whole carrier population. 
On the other hand several considerations, [1], support the existence of two 
thermal distributions at different temperatures for electrons having energies 
respectively lower and higher than a suitable threshold energy, the so-called 
cold and hot electrons. This is the reason why several authors, [2,3] have 
introduced new fluid dynamical models in which two well-defined subpop- 
ulations of electrons are considered. Recent ely [4], a theoretical foundation 
for these models has been given by utilizing the moment method and a clo- 
sure technique by which one can obtain both the constitutive fluxes and the 
production terms, appearing in the moment equations, as functions of the 
fundamental hydrodynamical variables, without resorting to MC. 

However, for the sake of simplicity, a parabolic band approximation has been 
used also for the hot electrons. Here, we propose an improvement of the 
model in which the Kane dispersion relation is employed for approximat- 
ing the electron energy in the conduction band. Furthermore we assume as 
threshold energy that corresponding to the energy gap in Si, so to have direct 
information about the electrons which can give rise to impact ionization. 

In order to test the model we apply it to bulk Si and compare the results 
with those obtained by an usual hydrodynamical model. Comparisons with 
Monte Carlo results are also present relatively to hot electron variables. 
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2 Boltzmann transport equation and Moment 
equations 



We treat the case of silicon unipolar devices for which the electrons contribut- 
ing to charge transport are those in the six equivalent valleys around the six 
minima of the conduction band [5]. We assume that, for those electrons, the 
relation between the energy, f , and the quasi- wave vector, k, both measured 
from the bottom of the conduction band, is given by the Kane dispersion 
relation 



£{k)[l + a£{k)] = ^, k€5^^^ ( 1 ) 

which involves a parameter a = 0.5 called the non-parabolicity factor, 

while m* = 0.32 rrie is the electron effective mass, rrie being the free electron 
mass. 

At a kinetic level, electrons in a semiconductor are described by a one-particle 
distribution function, /(x, f, k), whose evolution is governed by the semiclas- 
sical Boltzmann equation (BTE) coupled to the Poisson equation for the 
electric field E 



dt 

Vx 



9x h dh 
(cE) = q[N+{x) - N^{k) 



+ v(k) 



= C[f], 

n(x)] 



( 2 ) 



here, q represents the absolute value of the electron charge, h the reduced 
Planck constant, e the dielectric constant, and N- the donor and acceptor 
concentrations respectively, and n the total electron number density, v, the 
electron group velocity, depends on the energy £ through the relation 

v(k) = iVkf(k). 

C[f] is the collision term, which takes account of the various scattering mech- 
anisms the electrons undergo in a semiconductor. In the non-degenerate case, 
its form is 

C[f] - [ [ii;(k', k) /(k') - w{K k')/(k)] dk ' , 

where u;(k,k') represents the sum of the various electron scattering rates 
from a state with wave vector k to one with wave vector k'. We will take into 
account the following scattering mechanisms for silicon: 



— electron - acoustical phonon intravalley scattering, 

— electron - phonon intervalley scattering, for which there are six contri- 
butions: the three ^ 1 ,^ 2, ^3 and the three /i, / 2 , /s optical and acoustical 
intervalley scatterings [5], 

— electron-impurity scattering. 
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Introducing a threshold energy, Sthr^ for the electrons, (here we use the 
threshold energy of impact ionization Ethr — 1.12 eV), and the kinetic quan- 
tities: 1, V, V, we describe the electron flow by means of the fundamental 
variables: number density, average velocity, energy and energy flux of elec- 
trons having energy less than and greater than Sthr^ 



fn dk, 

\ uh Wh = £fH dk, 

I nc = fc dk, 

\ nc Wc = Iac Efcdk, 



riH Vh = v/if dk, 
nn Sh = dk , 

nc Vc = v/c dk, 

nc Sc = Svfcdk, 



( 3 ) 

( 4 ) 



with Ah = {k : f(k) > Ethr} and Ac = 5ft® — Ah, henceforth the subscripts 
H and C will indicate quantities referring to hot and cold electrons respec- 
tively. 

From the BTE one can obtain the following evolution equations for these 
macroscopic quantities: 



dni 
dt 
driH Ph 
dt 

driH Wh 
dt 









di"^ 

dUH Ujj 



dxd 

+ - qSthrE^Ni + qEinHV}j = Cw„ , 

+ q Ej Uh — q EthrEj = Cgi , 



q Ej T'i +qE^nn = Cpi , 



duH S 



dt 



'JL 



+ 



d nu F 'J 
dx^ 



( 5 ) 



dnc 

dt 



+ 



dnc Vc 
dx^ 



= Cn 



qEiM\ 






dnc Pc’ I dnc i 7712 

+qE^nc = Cpi, 



dt 



qEjVi, 



dnc Wc dnc S}. 



dt 
dnc S}. 
dt 



+ 

-h 



dx'^ 



dnc 



+ qEiUcV^ = Cwc - Q ^thrE^ Mi, 
+ q Ej nc = Cgi - q EthrEj , 



(6) 



where summation over repeated lowercase letters is understood and 



P\ — — [ hk^ f A dh. = m* {VX + 2a5^) (average crystal momentum), 

Jaa 

U]l = f v^hkX fAdh (average crystal momentum flux), 

'^A J Aa 

F? = — f S{k)fAd]<. (average flux of energy flux), (7) 

Jaa 

^nA — I d[/]dk (density production), 

Jaa 
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Cpi = / hk^C[f]d\<i (crystal momentum production), 

Cwa — / S{k)C[f]dk (the energy production), 

Jaa 

Csi = / S {k)C[f]dh (energy flux production), 

"" J^A 

all these quantities referring to electrons in zone A, with A = H^C. The 
first set of equations is found multiplying the BTE by the kinetic quantities 
1,/ik, and integrating over Ah^ while the second set is derived by 

subtracting the first one to the corresponding usual moment equations. 

The surface terms 

= I l^fH k*fH I'^da, ( 8 ) 



with = {k : 5(k) = ^thr}, and inner normal to represent the increas- 
ing rate of the corresponding macroscopic quantities due to the net migration 
of carriers from one energy zone to the other owing to the driving electric 
field. 

In equations (5), (6), in addition to the fundamental variables (3), (4), the 
extra-unknowns (7), (8) appear, so that, in order to have a closed system 
of equations, it is necessary to express the latter variables in terms of the 
former ones. A way to get constitutive relations, which lies on sound physical 
bases, is to use the maximum entropy principle, for more details the reader 
is referred to [6-8,10]. This principle furnishes the form of the distribution 
functions that make the best use of the knowledge of a finite number of mo- 
ments. 

In particular, having assumed as fundamental variables Wa and S^i, 

the maximum entropy distributions /^i, A = are those which make 

the electron entropy extremal under the constraints of fixed values of those 
variables. 

The electron entropy can be written as 



with ks Boltzmann constant. Therefore the distribution functions /a^ A = 
that maximize it under the constraints n^,V^, Wa and Sa are given 
by 



s[fc,fH] = -ks 



[ ifc log/c 
J An 



fc)dk+ [ {fn log/// - fn) dk 
J Am 






ME 



exp 



ks 



Sa + 









where the A’s are Lagrange multipliers that take care of the constraints. 

In order to determine the Lagrange multipliers in terms of n^, Ya, Wa, 
Sa, A = one has to insert the expressions of the maximum-entropy- 

distribution functions into (3)- (4) and solve the resulting system. After that 
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the closure relations can be obtained by substituting the distribution func- 
tions, expressed in terms of the fundamental variables, into (7) and (8). 
However, on account of the algebraic difficulties, it is possible to get only 
approximate expressions for the Lagrange multipliers under physically rea- 
sonable assumptions on the distribution functions. 

In particular, on the basis of MC results, we assume that the anisotropy of 
the A — H remains small even out of equilibrium. We formally 

introduce a small anisotropy parameter 6, assume that the Lagrange multi- 
pliers are analytic in S and expand them around ^ — 0 up to the first order. 
The representation theorems for isotropic functions imply that Xa and X^ 
are of order zero in 5, while and are of the first order. Therefore the 
approximate expressions of the maximum entropy functions 

= exp , A = H,C (9) 



are used to obtain the following results for the Lagrange multipliers 

h^nc 



Xh = —Kb log 

X^ =9h\Wh), 

X^ = bii Vh + bi2 Sh , 



h^njj 



,4 7T m* \J2 m* •< 



0 / 



KSi 



■ bi2 Vh + b22 Sh , 



Ac = —Kb log 



4 7T m* V2 m*d^ 



0 / 



A^ =9c\Wc\ 

A^‘ = Vh + bf2 Sh, 

Ap = ^\2 Vc + Sc , 



(10) 



where 



dk 



(AJ) = /" ^/£{l + a£)il + 2a £) exp(-A^ £) dS, 

Ja£a 



with ASh = (Sthr^+oo) and ASc = {O^Sthr)- 9 a^ inverse functions 

of 



(\W\ _ 4 _ TJ ^ 



and the 


) coefficients 6^-, A 


0 ^ 

II 


are given by 


bti 


"22 lA 

- ]JA^ ffi2 - - 


^12 

DA^ 


II 


with 










1 

to 


- -- 


2pf 


«!! 


3 m\ do ’ 


^12 — r 





and 

= ahi a^2 - {0’i2f . 



^22 — 



2pj 



( 11 ) 



( 12 ) 



A = H,C, 
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Pt being 



Pt 



[£{l + a£)f/^ 



^ r £^ [£{l + a 

J Aa 1 + 2 a 



exp(-Aj£’)cif, A=^H,C. 



£ 



For the inversion of the functions ( 11 ) we have resorted to a numerical ap- 
proach. 



3 Application to the case of bulk Si 



In this section we test the model in the case of bulk Si. The physical situation 
is represented by a Si semiconductor with uniform doping, which we take to 
be iV+ = 10 ^^cm~^. All the above-mentioned scattering mechanisms are con- 
sidered. As regards the balance equations, taking into account the symmetry 
with respect to space translations, we can drop the spatial dependence. More- 
over, in cases when a constant bias voltage is applied to the semiconductor, 
the Poisson equation (2)2 is satisfied with n equal to the value of the doping 
concentration and E constant. Being the motion along the direction of the 
electric field, we can take this as the x-direction, and therefore the balance 
equations reduce to the following set of ordinary differential equations 



UH = nc G" {Wc) - riH L” {Wh) + qEDniWH^Vn, Sh), 



d 
dt 

m* —nnYH = -qnHE{l - 2 am* Gh) + ^^[(cn - 2am*cfi)V// -h 
+ {ci2 - 2 a m*C22)SH] - uh q e, 

1 -h za Cthr 



( 13 ) 



( 14 ) 



dt 



ubWh — —riH qVn E nc Gw{^c) — riH L^{Wh) + rin Cw^ {Wh) + 
-\-q£thrnHE DniWH.VH^Sn)^ ( 15 ) 

= -quHEGn + nHVH{c^iV„ + cgSn) - ( 16 ) 

dt l-\-zacthr 



dt 



nc 



dt 



Tin, 



m* —ncVc = -qncE{l - 2 am* Gc) + ^c[(cfi - 2 am*C 2 \)Vc + 
dt 

, / c o * ^ c 1 I {Wh) 

-h(ci2 - 2 am C22)5c] + tih q 7—^ — — F, 

1 -f z a Cthr 



( 17 ) 



( 18 ) 



^ncWc = -ncqVcE + nn G%{Wh) - nc L^{Wc) + nc (Wc) + 
dt 

-q£thrriHEDn{WH^VHiSH)^ ( 19 ) 



dncSc 

dt 



-qncEGc + nc 



Vc(c‘ixVc^c^ 2 Sc) + nHq^if^^^^E, (20) 



where Va and Sa are the x-components of Y a and Sa and Ga is the xx- 
component of A — H^C^ while the the production terms G^, L^, G^, 
I/^, cA, A — iJ, G, 2, j = 1,2 and the surface terms Dn and Dw are tabled 
functions of their arguments. 
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The solutions of (13)-(20) for electric fields respectively equal to 1,2,5, and 
10^, are reported in Fig.s 1, 2, and 3. We notice that, in agreement with the 
results in [2], the stationary value of Wh slowly varies with the electric field, 
due to the Kane approximation to the structure of the silicon conduction 
band. 

We also compare the results concerning the hot electron variables with those 
obtained by MC, see Table 1 and Table 2. The different qualitative behaviour 
in the velocity of the hot electrons is due to the fact that in the two fluid 
model Vh starts decreasing at electric fields higher than 10 V/jim, so as 
the total electron mean velocity in the one fluid model. In Figs. 4 and 





TERM 


MC 


riH 

Wh 

Vh 


0.14 % 
1.1640 eV 
2.1305 lO’^ cm/s 


0.13 % 
1.1814 eV 
2.7377 lO’^ cm/s 



Table 1. Comparison between the results obtained by the tail electron hydrody- 
namical model and those derived by MC simulation, E=7.bV/fim . 





TERM 


MC 


riH 

Wh 

Vh 


0.97 % 
1.1751 eV 
2.2637 10^ cm/s 


0.87 % 
1.2133 eV 
2.6390 10^ cm/s 



Table 2. Comparison between the results obtained by the tail electron hydrody- 
namical model and those derived by MC simulation, E=10V/ fim . 



5 we also compare the results relative to the average quantities over the 
whole electron population with those derived by a standard hydrodynamical 
model for semiconductors based on the maximum entropy principle [9]. The 
agreement is very good so that it is possible to state that, while the model 
gives direct information on the hot electrons, no accuracy is lost regarding 
the total electron population. 
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Fig. 1. The time evolution of the hot electron density and average velocity for 
different values of the electric field and for N-\. = 10^ 




Fig. 2. The time evolution of the hot electron energy and energy flux for different 
values of the electric field and for = 10^^ /cm^. 




Fig. 3. The time evolution of the cold electron velocity and energy for different 
values of the electric field and for AT+ = 10^^/cm^. 
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Fig. 4. Comparison of the time evolution of the average electron velocity for E = 
2, 10 ^.Continuous line: two population model, crosses: single population model 




Fig. 5. Comparison of the time evolution of the average electron energy for E = 
2, 10 — .Continuous line: two population model, crosses: single population model 
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Abstract. Thermal effects influence the electrical behaviour of circuits more and 
more. Therefore it is necessary to take power dissipation and temperature evolution 
into account. In order to analyize large systems of integrated circuits, this has to 
be realized very efficiently. Thus we introduce a thermal network model consisting 
of OD and ID thermal elements approximating the full heat aspect, but keeping the 
system relatively small. After semi-discretization, this approach yields a coupled 
DAE system. According to the largely differing time scales, we outline the basics 
of a multirate co-simulation algorithm, which bases on an averaging technique. Its 
potential and feasibility is demonstrated on a simple, however, instructive test- 
circuit. As an outlook we discuss the application to thermal models of SOI-chips. 



1 Introduction - Task 

According to growing package densities, former secondary effects like self 
heating get important for the general behaviour of a chip. Industry predicts 
up to 100 Watt/cm^ in the near future [7]. Therefore the temperature rise 
has to be in the scope of circuit simulation in an adequate way. 

Commonly circuit simulators use more or less independent networks for 
the thermal aspect: Thermally modelled elements like transistors are equip- 
ped with an thermal network, leaving mostly out some efficient interaction 
of these networks. 

Thus modelling takes place in the following circumstances: On the one 
hand, circuits are described by zero dimensional electric objects equipped 
with an incidence matrix conveying the topology. On the other hand, powers 
are dissipated and stored locally. Moreover, temperature slowly levels out 
according to the temperature gradient. And the thermal problem is a problem 
in real space, with a spatial heat distribution. That is, coming from ordinary 
circuits, the needed spatial coordinate is not at hand. Thus first in this paper, 
we address the modeling of the electro-thermal problem 

Furthermore, the time scales of electric systems and thermal aspects dif- 
fer by several orders of magnitude for a heat conduction over a centimeter. 
Thus we have a multirate setting, which has to be addressed [2] . In this mul- 
tiphysical setup, an energy coupling enables the application of a multirate 
co-simulation, which is addressed in the later part of this work and applied 
to a test circuit. 
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2 First order thermal modelling 

The modelling constraints are: 

(1) feasibility in circuit simulators, that is, no large thermal networks than 
electric ones, with respect to the number of equations; 

(2) usage of the available information as far as possible - only a few additional 
topological informations shall be supplied; 

(3) facilitation of a fast simulation technique. 

A thermal network with interacting domains has to have a spatial dimen- 
sion and we have to deal at least with ID approximations. Regarding the 
computational effort, lower dimensions are preferable. On that basis we pro- 
pose a thermal model, the accompanied thermal network^ which consists of 
OD and ID thermal elements - thus it is a modelling-cost trade-off. However, 
that network can be regarded as a step on the hierarchy of thermal modelling 
towards a full 3D evolution. 

Now, let us render modelling precisely. Let A be the circuit topology 
(obtained by modified nodal analysis - see [5]), then Aip C A (in terms of 
branches) shall describe the thermally relevant zero dimensional elements; 
and Atr the ID elements, which could be branches of A as well as additional 
thermal- ’interconnects’. For simplicity we normalize the spatial extension of 
each element: x G [0, 1]. Furthermore, we separate left {x — 0) and right-hand 
{x = 1) ends in the incidence matrix Atr B — D, where the matrix entries 
of B, D belong to {0, 1} — that is, a ordering with respect to the ends. 
Since coupled OD elements have a common heat capacity, we form connected 
units - this involves two identifications: P and S identify thermal branches 
and electric network nodes with the corresponding OD units, respectively. In 
that way, heat mass for each unit is given as: M = PMP^, where M is the 
diagonal matrix of heat mass with entries: m — pdcV^ for each OD element 
using its density heat capacity c and volume V. 

For a simplified setup, we define artificial OD units for the junctions of ID 
elements only, which allow to treat boundaries equally. The unit’s tempera- 
tures are denoted by T. 

In total, there shall be a number of mtr ID elements and mip OD elements. 

The networks, and OD, ID elements have to interact, therefore the cou- 
pling structure has to be described: If we split the dissipated power in the ID 
and OD contribution, Etr and E]p, respectively, then we have: 

• Circuit-to-Heat: The power dissipation of OD elements is a global source 
for its thermal unit. The net unit power transition is given by PEip. 

A ID element dissipates power as a OD quantity, only if the branch be- 
longs to A. We obtain a local dissipate power via a specified distribution 
function: pi{x^T) / (Ri{T)ai{x)) where VxR — p and ai{x) denotes 

the cross-section. 
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• Heat-to-Heat: In the accompanied thermal network, every ID element is 
connected to a OD unit and vice versa. The heat passing from thermal ele- 
ment to element is given by a corresponding ID-to-OD flux. Thus we sum 
up these fluxes: for each connected unit the sum runs over all attached 
ID elements: inward flow minus outward flow gives 



Bfac^(0)V,T(0,^) - Dfac.4(l)V,T(l,f), 



— (Bfacj Dfac) 



A(0)V,T(0,0 

yl(l)V,T(l,0, 



S(B,-D) 



yl(0)V,T(0,^) 

T(1)V,T(1,^) 



where Bfac — SB, i.e., a projection of B into the accompanied thermal 
network (Dfac is analog), and A the diagonal matrix of heat fluxes times 
the corresponding local cross-sections. 

• Heat-to-Circuit: The OD branch temperatures are simply obtained by: 
T. A4oreover, it is possible to have heat distribution dependent circuit 
parameters for the ID thermal elements. In that case, there is a kind of 
density, which describes the ratio of local electric effect to the total effect 
(cf. [3]). 

Now we can formulate the corresponding mathematical description given 
in Box 1 (boxed appear the coupling variables). First, using modified nodal 
analysis, the electric network is described by a set of differential algebraic net- 
work equations in the node potentials u and branch currents jy , (through 
voltage sources and inductors) and the system is closed by a consistent ini- 
tial value, (DAE-IVP). Secondly, the thermal network is formed: for each ID- 
element there is a heat equation in temperature variables T : [0, 1] x [0, oo) ^ 
fQj. the connected units, temperatures T : [0, oo) — >• are deter- 

mined solely from temporal heat change. Additionally, we model cooling to 
an ambient temperature Tgnv cts first order effect, i.e., it is described by New- 
ton’s cooling and depends on the surface to volume ratio, roughly, F and a 
transition factor 7. The thermal network is accomplished by the identifica- 
tion for the boundaries of ID elements with OD units temperatures (BC) and 
initial values (IV). In the overall, thus we have parabolic PDFs coupled with 
DAEs via right-hand sides and source terms. 



3 Simulation algorithm 

A large potential of savings in the computation of thermal-electric systems 
is found in comparing the time scales of the subsystems: circuit signal time 
'^circuit (e.g. input) and thermal relaxation time Theat- 

^circuit ^ Id • • • Id sec 

^heat = 1‘^C/X^ 1 ... 10 sec (/ = length cm). 

Here we have an extra-ordinary multirate behaviour. A straight forward and 
popular idea of tackling coupled problems is co-simulation, where each physi- 
cal subsystem is addressed separately by some adapted algorithm or available 
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Box 1: 



Coupled thermo-electric problem. 



electric network: (DAE-IVP) A == (Ac, Ac, Al, A/, Ay) 

0 = Acq( AI;u(<), ^) + Acr( A^u(<), [t^^) + Al (f ) + A7^(f ) + AvjvW 



0 = Ayu(t) — v(t) 

(IV) x(to) = (uo, Jl,o, Jv,oV 
coupling interface: 

(EL(<),E|p(i )7 =E (0 = diag(jB)AlpU, 
thermal network: (PDAE-BIVP) x 6 i?= [ 0 , l]’™tr 



P'T, R 



(ID) Ci'Tii^Xyt') — Vx(AiVx 7 )(x,t)) 'yFi' (T) (x, t) Tenv) 

Pi{x, Ti) 






(OD) 



(BC) 

(IC) 



Hi (t, 7)) di 

0 = Vx-Rt - pi(x,Ti{x,t)) (i = 1 , . . . ,mtr) 

//l( 0 )VxT (0 
C(l)VxT(l 






'yF(X TenvUfc) 



+ p 



Eip(^) 



T( 0 ,t)-B]^,T(t), 
X(a:, 0 ) =Tenv^m+j. 



T(l,t) = DLT(t) 



Notation: C heat capacity per volume; A heat conductivity; p/R local distribu- 
tion of electric parameter; a cross-section. 



tool. Therefore time- stepping and coupling, which is needed to create an ef- 
ficient and robust scheme, is slightly more delicate. 

Our proposal is to use multirate- co- simulation^ that is, to perform no itera- 
tion (as long as possible). To enable that, following [4] we pay some additional 
effort in computing the coupling variables (dissipated powers) carefully and 
use a averaging technique. The outline of that algorithm is given in Box 2. 

To obtain a full waveform relaxation scheme, we have to add to the al- 
gorithm between position 2) and 3) a convergence check, and need to iterate 
steps 1) and 2). The thermal network in step 2) is preferably semi-discretized 
in space and then time integration is performed (method of lines) , since we 
think of common SPICE-like simulators. That means a coupled system of 
DAEs is integrated over time. By introducing the dissipated energies (inte- 
gral over the powers), we couple the subsystems by differential variables, and 
thus waveform relaxation is suitable [1]. 
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Box 2: Simulation Algorithm. 



0) define communication step size H and window [to, to + H] 

1) compute electric network and energies of desired accuracy over 
[^ 0,^0 + H]; (keep temperature constant) 

total ’local’ dissipated energies 

(local = elementwise, using available quantities in the simulator) 

2) compute thermal part for + H], add energy linearly! 

3) go to one: to :—to + H 



Multirate is very naturally achieved by the ’decoupled’ computation of 
electric and thermal network. But this has a price: the solution of the full 
coupled system is only of order 1. However, we are mostly interested in the 
precise computation of the electric network. A splitting technique can restore 
the order. 

Next we want to study the performance of that algorithm for a test-circuit. 



4 Simulation of a test-circuit 

For testing, we use a simple, but in- 
structive, circuit (see Fig. 1), which 
is capable of all effects. It consists 
of an operational amplifier, two re- 
sistors, a diode, and some capacitor. 

Its general electrical behaviour is as 
follows: the capacitor C is charged 
via the diode, and later discharged 
via the load resistor Rl, 

The accompanied thermal network consists of two OD elements: the am- 
plifier as heat source, and the diode being thermally dependent. Last, the 
resistor R{T) is modelled as ID thermal element. Thus we cover the full ca- 
pabilities of the accompanied thermal network and address so all thermal 
effects. 

Using network analysis the circuit equations can be set up. Here we model 
R{T) as a piece of very thin copper with the following local resistance over 
[0,/] = [0,4 cm]: 

p{x,T) = ro(10 • (/ - x)x -h Co)(l -h a{T{t,x) - Tmeas) + P{T{t,x) - Tmeas)^)/a 

(with Tmeas reference temperature, basic material resistivity ro -Cq = Pcu, 29 i 7 
and cross-section a; the specific heat shall be equally distributed). Further- 
more, we need a diode current: 



V(t) 



A R{T) 

Ul ^ U2 ^ 1 ^3 U4 






Fig. 1. Test-Circuit. 



■^D(Rdi,Tdi) = is{To\)[exp{ud\/vT) ~ 1] 




Multirate Co-simulation 



109 



(with thermal voltage vt and saturation current Is, see [6]). For the ther- 
mal network, some dimensions of the concurring physical objects have to be 
assigned: volume, surface, texture. 

To run the above simulation algorithm, we have to equip the network with 
two additional equations 



dt 



E. 



tot, op 



— (^op) * JA") 



^^£^tot,tr 



- {ur{t)) ' Ja 



for the total energy over the communication window [to, to + H], Now, we 
excite V{t) as single sine wave extending over [0 sec, 25// sec], and integrate 
until / = 0.5 sec. The solution of the thermally modelled resistor is depicted 
in Fig. 2 (left) and the single-rate and multirate solution are compared at 
time t = 0.5 sec. Here an absolute error of less than 5 • 10”^ K occurs in the 
multirate method. 




Fig. 2. Heat evolution of dragged on resistor (left). Final heat distribution at t = 
0.5sec (right). 



We compare single-rate results versus multirate co-simulation in terms of 
time steps: 





steps 


signal time + heat evol 


comm. 






[0, 26. /i sec] 


[26./xsec, 0.5sec] 


steps 


single-rate 


406 


348 


58 


- 


multi-rate 


network 


339 


60 


26 


co-simulation 


heat 


1 


25 



We observe, that the electric network conducts the number of steps in the 
single rate algorithm (since it is the faster component) and these steps are 
reproduced by the multirate algorithm. Furthermore, in any communication 
step the thermal network is computed in a single step. Thus the simulation 
result is very close to the optimal: the electric circuit takes as many steps 



no 
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it needs to resolve the diode’s switching (this number depends also on the 
chosen parameters), while temperature has to do almost nothing. This of 
course justifies the use of a decoupled thermal network, but only for time 
integration over small scales. 

Of course, these multirate scales can be found in chip technology, too. 
Therefore we are going to point out the arising thermal ID-structures in the 
recent SOI chip technology. 

5 Application and Outlook 

Especially silicon-on-insulator (SOI) MOSFET-chip technology suffers from 
self-heating: An additional oxide layer causes the MOSFET body to float 
(smaller power consumption). Unfortunately, this layer behaves as thermal 
insulation. The temperature rise effects the particle mobility and through that 
the switching behaviour of the transistor. Arrays of SOI transistors (arrays of 
gates) can be regarded as ID elements: They are lined up on the silicon waver 
and power supply and ground lines separate these chip portions. Mainly, the 
push-pull amplifier dissipates power and heats up the chip. The local gate 
activity causes a small additional heating. Thus the above modelling can be 
applied (only the source term of (ID) in Box 1 gets a sum). We obtain a simple 
thermal network, consisting of ID-elements for each thermally modelled line 
of transistors and a single OD element for the amplifier. 

The further step in the modelling hierarchy would involve 2D approxima- 
tions: A simple strategy is to use ID elements in a grid formation. But, also 
2D elements can be integrated in the above approach by specification of the 
boundary conditions. 

Of course, if the space dimensions shrink the multirate potential decreases: 
the thermal time constant is proportional to the square of the extension 
and so the application of this algorithm is tailored to dimensions of a few 
micrometers for gigahertz signals and circuits (depending on the actual pa- 
rameters). The less seperated the time scales are the more a co-simulation 
version of the algorithm seems to be adequate, i.e., the application of iter- 
ations. In the region of few differing time scales a full co-simulation is still 
valid and an alternative to the all at once solution. Then the energy coupling 
can be dropped as additional effort. - So if a small scale electric technology 
exhibits macrostructures (ID) of some orders of magnitude larger, a fruitful 
application is expected. 

Summing up, we have pointed out the relevance of thermal-electric mod- 
els in chip design and the construction of fast simulation techniques. Our 
proposal is a first order thermal approximation called accompanied thermal 
network. This can be solved using a multirate co-simulation algorithm, which 
is based on an averaging technique. A test-circuit shows feasibility. 

This basic mode of an algorithm has to be elaborated in the future; there 
are various refinements and developments conceivable, for instance, how be- 
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have refined coupling techniques, or interpolated extrapolation; how can we 
predict the communication step size H] and is it possible to use Newton- type 
techniques for exploitation of the available derivative information? Paral- 
lel computation of thermal ID elements? Moreover stronger couplings and 
smaller spatial scales have to be investigated. Also the analytical side of the 
coupled PDAE system poses interesting questions of index and solvability. 
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Abstract. For solving sparse linear systems from circuit simulation whose coeffi- 
cient matrices include a few dense rows and columns, a parallel Bi-CGSTAB algo- 
rithm with distributed Schur complement (DSC) preconditioning is presented. The 
parallel efficiency of the solver is increased by transforming the equation system 
into a problem without dense rows and columns as well as by exploitation of paral- 
lel graph partitioning methods. The costs of local, incomplete LU decompositions 
are decreased by fill-in reducing reordering methods of the matrix and a threshold 
strategy for the factorization. The efficiency of the parallel solver is demonstrated 
with real circuit simulation problems on a PC cluster. 



1 Introduction 



The simulation of large, highly integrated circuits leads to non-linear dif- 
ferential algebraic equations. For integration of these equations, accurate 
solution methods for sparse linear systems are required within the non- 
linear iterations. The corresponding matrices are real, non-symmetric, very 
ill-conditioned, have an irregular sparsity pattern, and include a few dense 
rows and columns. 

When the systems become large, iterative solvers are very likely to outper- 
form direct methods. For convergence acceleration of iterative solvers, par- 
allelization and appropriate preconditioning are suited techniques to reduce 
the execution time. 

We present a parallel Bi-CGSTAB algorithm with distributed Schur com- 
plement (DSC) preconditioning [4] which achieves an accuracy of the solution 
similar to a direct solver but usually is distinctly faster for large problems. 
The parallel efficiency of the method is increased by transforming the equa- 
tion system into a problem without dense rows and columns as well as by 
exploitation of parallel graph partitioning methods. The costs of local, incom- 
plete LU decompositions are decreased by fill-in reducing reordering methods 
of the matrix and a threshold strategy for the factorization. 
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2 Problem of Dense Rows and Columns 

Matrices from circuit simulation problems are usually very sparse but include 
a few (nearly) dense rows and columns. In the parallel case, dense rows and 
columns are difficult to handle for partitioning methods since they result 
in couplings between all equations. In addition, good load balance is hard 
to achieve if the matrix is distributed row- wise. Fill-in reducing ordering 
methods may become very costly due to a few dense rows and columns, and 
the matrices may get very ill-conditioned. 

Fortunately, dense rows and columns are usually easy to remove from cir- 
cuit simulation matrices since the corresponding columns or rows as a rule 
have only one non-zero entry on the diagonal. Such equations normally in- 
clude voltage sources (constraints). A dense column whose corresponding row 
has only one diagonal entry can be removed since the corresponding unknown 
can be determined from the row equation and substituted in all other equa- 
tions. On the other hand, a dense row (equation) whose corresponding column 
has merely one diagonal entry is only responsible for the corresponding un- 
known. All other equations can be solved independently. If the corresponding 
columns or rows of dense rows and columns do not have one diagonal entry 
only such rows and columns can be handled by using the Woodbury formula 
[2]. This case is rare for circuit simulation problems and does not occur for 
the matrices investigated here. 

3 Distributed Schur Complement Techniques 

3.1 Definitions 

Fig. 1 (left) schematically displays the row- wise distribution of a matrix A to 
two processors. Each processor owns its local row block. The square matrices 
Ai are the local diagonal blocks of A. We assume that the local rows are 
arranged in such a way that the rows without couplings to the other pro- 
cessor (s) come first and then the rows with couplings. The former are called 
internal rows, have only entries in the Ai part of the local rows and are not 
coupled with rows of other processors. The latter additionally have entries 
outside the Ai part or are coupled with rows of other processors. These local 
rows are named local interface rows. The part outside Ai which represents 
couplings between the processors is called local interface matrix Xi. From 
the view of processor 2 in Fig. 1 (left), the local interface rows of processor 
1 with entries at column positions in the area of X 2 are external interface 
rows. Since the sparsity pattern of circuit simulation matrices usually is non- 
symmetric local interface rows of processor i may have entries in Ai only but 
are uni-directionally coupled with rows of other processors. These rows are 
external interface rows from the view of the other processors. This can not 
be determined locally on processor i, communication is necessary. Since each 
row of the matrix corresponds to a specific unknown of the equation system 
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Bi-CGSTAB iteration for all local rows 

(unknowns) 





Bi-CGSTAB iteration for the local 
interface rows (unknowns) 






Matrix-vector multiplication: 
Communication of external 
interface unknowns 






... 


1 Matrix-vector multiplication: 

1 Communication of external 

I interface unknowns 




Fig. 1. Left: DSC definitions: Matrix distributed to two processors. Right: 
Schematic view of the DSC algorithm on each processor. 



(row 1 to solution vector component 1 and to right hand side component 
1, e.g.) internal unknowns, local interface unknowns, and external interface 
unknowns can be defined correspondingly. 



3.2 Algorithm 



Fig. 1 (right) gives a schematic survey of the DSC algorithm per processor. 
On each processor an outer Bi-CGSTAB iteration [5] is performed for all local 
rows (unknowns). As basic iterative method, a flexible variant of GMRES, 
FGMRES [3,4], is also well suited for the DSC algorithm but is not con- 
sidered here because of its higher storage requirements. The outer iteration 
contains a partial matrix- vector multiplication which requires communication 
since each processor only owns its local segment of the vector. It is necessary 
to exchange components of non-local vector segments which correspond to 
external interface unknowns (rows). 

Within the outer Bi-CGSTAB iteration, an inner Bi-CGSTAB iteration 
for the local interface rows (unknowns) only is performed. This includes a 
partial matrix-vector multiplication of the interface system but the commu- 
nication scheme is the same as for the outer matrix- vector multiplication and 
thus has to be implemented only once. 

From the mathematical point of view, each processer i solves the following 
equation: 



Xi + Xi 2/i,ext — ^2 5 — 




( 1 ) 



Xi are the local vector components, ^i,ext the external interface vector compo- 
nents, and hi is the local segment of the right hand side vector. Xi is split into 
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the internal vector components ui and the local interface vector components 
yi, bi accordingly. 

Ai is then split (see [4] for details), and (1) is reformulated: 



[BiFA fBiFAfuA ( 0 

I, E, cj ^ V cj V 2/i C V ^Neighbours,- Vj 



fi 

9i 



.( 2 ) 



The result of the sum over all neighbouring processors j with couplings to 
processor i in (2) is the same as that of in (1). Eijyj is the part 

of which reflects the contribution to the local equation from the 

neighbouring processor j. 

The matrix equation (2) represents two equations. From the first, we 
derive an expression for substitute ui in the second equation and get 



Ui = BAifi - Fiyi) Si yi+ ^ Eijyj = gt - EiBA fi ■ (3) 

Neighbours J 



Si = Ci—Ei B^^Fi is the local Schur complement Note that (3) is an equation 
for the interface vector components only. 

(3) can be rewritten as a block- Jacobi preconditioned Schur complement 
system [4]: 



yi+SA E Eijyj=SAi9i-EiBAfi) . (4) 

Neighbours J 



3.3 Preconditioning 



Fig. 2 illustrates the principle of preconditioning within the DSC algorithm 
per processor. The outer iteration from Fig. 1 (right) is preconditioned per 
processor by a block incomplete LU decomposition with threshold (ILUT) 
[3] of the local diagonal block (Li Ui in Fig. 2). For preconditioning the inner 
iteration, a block ILUT for the local interface rows only is exploited. This 
factorization need not be computed but can be used from the lower right part 
of the decomposition for the outer iteration {Li^s Ui^s in Fig. 2). 

Mathematically speaking, we perform a block factorization of Ai on pro- 
cessor i using the splitting from (2): 






BiFi 

EiCi 



Bi 0 

Ei Si 



IBr^Fi 
0 I 



( 5 ) 



We then assume that we have the LU decomposition Si = Li^s Ui^s of the 
local Schur complement. With this, we formulate the LU factorization 



LiUi = 



Li,B 

EiUA 



0 

Li,s 



Ui,B 

0 



L~iEi 



Ui 



i,S 



(6) 



with Bi = Li^B Ui^B the LU decomposition of Bi. By transforming the right 
hand side of (6) into 



f Li,B 0 \ (Ui,B 0 \1 (I U^LAbEA _(BiO \ (IBAeA 
\EiUABLi,s)\ 0 I J \Ei Si) \0 I J 
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Local Interface 
rows 



Fig. 2. Principle of preconditioning within the DSC algorithm. 



we find after comparison with (5) that LiUi is an LU factorization of Ai, 
The other way round, we also see the practical advantage from (6) that the 
LU factorization Si — Li^s Ui^s of the local Schur complement has not to 
be computed explicitly if we already have an LU factorization of the local 
diagonal block 

If we perform incomplete decompositions we get an approxmate, precon- 
ditioned Schur complement system with the approximation Si of the local 
Schur complement Si (compare with (4) ): 

2/, + Sy ^ Eijyj = Sr\9i-EiBr^f,) . (7) 

Neighbours jf 



3.4 Repartitioning and Reordering 

The distributed sparsity pattern of the matrix can be represented as a dis- 
tributed graph with nodes and edges. Graph repartitioning can then be used 
to reduce the number of couplings between the distributed matrix row blocks. 
In graph theory formulation, the reduction is done by a minimization of the 
number of edges cut in the graph. This goal of graph partitioning corre- 
sponds to a minimization of the number of interface unknowns in the DSC 
algorithm, and thus problem (7) is made very small. For graph partitioning, 
we use the ParMETIS software from the University of Minnesota [1]. Since 
ParMETIS requires an undirected graph as input the non-symmetric pattern 
of the matrix has to be symmetrized for the matrix graph construction. 

For local, incomplete decompositions, we use METIS nested dissection 
reordering to reduce fill-in into the factors [1]. Nested dissection reordering 
usually generates a similar sparsity pattern for the local diagonal blocks Ai 
on each processor i. This results in similar fill-in for each ILUT and thus 
supports load balancing. 
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4 Results 

The following experiments were performed on an SGI workstation (300 
MHz, 1 GB main memory), NEC CCRLE’s PC cluster SUCCESS (six 4- way 
SMP nodes with Pentium III Xeon CPUs, 500 MHz, 1 GB main memory 
per node, Giganet interconnection network between the nodes), and NEC 
CCRLE’s PC cluster GRISU (32 2-way SMP nodes with AMD Athlon MP 
1900-f CPUs, 1.6 GHz, 1 GB main memory per node, Myrinet2000 intercon- 
nection network between the nodes). 

In all tests with the DSC algorithm, the outer iteration was stopped if 
the residual norm of the total equation system divided by the initial norm 
was less than 10“^^. The inner iteration was stopped if the residual norm of 
the interface system divided by the initial norm was less than 10“^^. These 
criteria resulted in high accuracy of the solution in all our experiments. 



4.1 Sequential Results: Iterative versus Direct Solver 

Table 1 compares sequential execution times of a direct method (software 
from Saad’s SPARSKIT [3], threshold for ILUT: 0) and of the iterative DSC 
solver for five equation systems from the simulation of NEC circuits ^ on an 
SGI workstation. The ILUT thresholds in the DSC case for the matri- 
ces row2m, 256md, Simys, ccp, and circ2a are 10“^, 10“®, 10“^, 10~"^, and 
10~^, respectively. In all following experiments, these thresholds are applied. 
METIS nested dissection is used for reordering. For the reduced matrices, 
dense rows and columns are treated by direct substitution (see 2). The re- 
sults in table 1 show that this cancellation of dense rows and columns accel- 
erates the direct solver significantly. The times for the iterative DSC solver 
are distinctly shorter or comparable with the times of the direct solver for 
the reduced matrices. In the case of the matrices 256md and circ2a, fill-in 
in the complete LU factors is exceptionally small. 



4.2 Parallel Results 

Original versus Reduced System Table 2 presents execution times of 
the DSC algorithm on eight processors of SUCCESS for the original and the 
reduced system. The number of interface variables and of outer iterations 
(see Fig. 1, right) are given in addition. Repartitioning and reordering are 
applied. The results in table 2 show that the times for the reduced systems are 
significantly shorter than the times for the original systems. This is mainly 
due to a distinctly smaller number of interface variables in the case of the 
reduced systems. Repartitioning is effective in this case and results in very 
low costs for the inner iteration from Fig. 1 (right). Therefore, all following 
experiments are performed with the reduced systems. 

^ The matrices can be made available on request. 
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Table 1. Sequential results on an SGI O^: Direct versus iterative DSC solver. 



Original matrix Reduced matrix Direct, time/s DSC, time/s 
Matrix Order Non-zeros Order Non-zeros Original Reduced Reduced 



row2m 


16570 


127130 


16502 


109731 


4.4 


3.5 


1.1 


256md 


26114 


160079 


26076 


141011 


2.3 


1.5 


1.9 


Simys 


24705 


183713 


24585 


127695 


13.4 


5.4 


1.0 


ccp 


89556 


760630 


89378 


603753 


51.1 


27.9 


5.6 


circ2a 


482969 


3912413 


482963 


2750390 


7169.5 


15.5 


20.0 



Table 2. DSC times on 8 SUCCESS processors: Original versus reduced system. 



Matrix 


Original system 
^interface vars Time/s 


#iter. 


Reduced system 
^interface vars Time/s 


#iter. 


row2m 


9123 


21.69 


3 


1792 


0.80 


2 


256md 


14263 


1.53 


2 


8160 


0.69 


2 


Simys 


22950 


1.33 


2 


1504 


0.12 


2 


ccp 


35484 


23.70 


2 


2102 


1.01 


2 


circ2a 


482874 


2944.90 


2 


481 


1.85 


1 



Effect of Reordering and Repartitioning In table 3, times of the DSC 
method on eight SUCCESS processors with both repartitioning and reorder- 
ing, with repartitioning only, and without repartitioning and reordering are 
displayed. The number of interface variables is given for the first and last sce- 
nario; for the second, the number is the same as for the first one. Additionally, 
the fill-in deviation, the difference of the maximum fill-in and the mean fill-in 
per processor divided by the mean fill-in, is specified. This is a measure for 
the degree of load imbalance during the construction and application of local 
ILUT factorizations. 

The shortest times by far in table 3 are achieved for the DSC method with 
repartitioning and reordering. Without reordering, load imbalance increases 
significantly (see column Deviation and 3.4). In addition, total fill-in for ILUT 
without reordering usually is distinctly higher than with reordering. For ma- 
trix circ2a, e.g., total fill-in is 5245267 in the first scenario and 17561396 
in the second. For the third scenario, graph partitioning is not applied, and 
thus the significantly increased number of interface variables leads to further 
loss of performance. 



Scalability Table 4 shows times of the DSC method on up to 64 GRISU 
processors for the two largest test cases. The speedup for ccp on eight pro- 
cessors is 2.3, the corresponding speedup for circ2a is 4.0. This shows that 
the scaling behaviour improves significantly with increasing problem size. On 
24 processors, a speedup of 6.7 is achieved for circ2a while the speedup 
on 64 processor is only moderately increased to 7.4. The use of higher pro- 
cessor numbers only makes sense for larger problems since the matrices are 
extremely sparse. 
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Table 3. DSC times on 8 SUCCESS processors: Effect of ordering and partitioning. 



Partitioning + ordering Partitioning only No permutation 

Matrix 9 ^^If. vars Deviation Time/s Deviation Time/s ifli. vars Deviation Time/s 



row2m 


1792 


4.0% 


0.80 


139.0% 


2.96 


10823 


63.5% 


22.15 


256md 


8160 


10.4% 


0.69 


251.3% 


2.38 


25182 


102.7% 


6.30 


Simys 


1504 


9.0% 


0.12 


16.2% 


0.31 


17996 


95.5% 


2.68 


ccp 


2102 


10.7% 


1.01 


30.5% 


2.01 


65856 


133.2% 


76.10 


circ2a 


481 


0.6% 


1.85 


2.2% 


6.62 


8752 


1.1% 


37.86 


Table 4. DSC times on GRISU: Scalability. 








Time/s on p processors 








Matrix 


p = 1 


p = 2 p 


- 4 


p = 8 p 


= 12 p 


= 16 p 


= 24 p 


= 64 


ccp 


0.78 


0.64 


0.38 


0.34 










circ2a 


2.11 


1.70 


0.90 


0.53 


0.41 


0.37 


0.32 


0.29 



5 Conclusions 

For equation systems from real circuit simulation runs, we demonstrated 
that the removal of dense rows and columns is crucial for direct and iterative 
solvers, in particular in the parallel case. For large problems, the iterative 
DSC algorithm presented usually is superior to a direct solver and shows a 
favourable scaling behaviour. To achieve the latter, graph partitioning and 
local ordering methods are necessary since partitioning keeps the interface 
system small and local ordering improves load balance besides fill-in reduction 
for local factorizations. Combined with these techniques, the DSC method 
presented is a well suited iterative solver for circuit simulation. 
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Abstract. This paper focuses on the development of a model to obtain quali- 
tative insight in the behaviour of large, but finite, phased arrays of microstrip 
antennas. This model concerns a finite array of simple elements, namely perfectly 
conducting, infinitely thin, narrow rings, excited by voltage gaps and positioned in 
free or half space. The currents on the rings, and from that the electromagnetic 
field, are calculated by a moment method. Dimension analysis is carried out to 
reduce numerical effort and to acquire insight in the behaviour of the array. The 
qualitative analysis shows promising results and although numerically a brute force 
method has been applied, CPU times are still acceptable. 



1 Problem Description 

Currently, Thales Nederland is realizing new radar systems consisting of large 
phased arrays of microstrip antennas. These arrays consist of about 1000 
antenna elements positioned on an antenna face of about 16 m^. The systems 
scan in azimuth by rotation and in elevation by phase shifts. A narrow main 
lobe and low side lobe level, an impedance match with the feeding network 
of the array, and a low cross polarization are design goals. 

To analyse such arrays, either a finite array model (or element-by-element 
approach) or an infinite array model is used at Thales Nederland. The infinite 
array model requires much less computation time and data storage demand 
than the finite array model. However, since it cannot account for edge effects 
and differences between the antenna elements, the need for a finite array 
model still exists. 

Since the actual geometry of the antenna elements is complicated, simula- 
tion of a finite array taking into account in detail these elements will require 
too much computing resources to be realistically feasible. Therefore, we have 
decided to develop a model based on simple elements that will enable us to 
find the characteristics that describe the qualitative behaviour of large phased 
arrays. 
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2 Modelling 



The main requirements on our model are the following. Firstly, the arrays 
are finite such that boundary effects can be incorporated. Secondly, compu- 
tation times should be in the order of hours. Thirdly, the algorithm should 
be based on analytical expressions to provide insight in the characteristics or 
characteristic parameters of an array. 

The qualitative model concerns a finite planar array of simple elements 
in free space or above an infinitely wide ground plate. Considering the radi- 
ating part of the actual antenna elements, i.e. a rectangular microstrip loop, 
we have chosen perfectly conducting, infinitely thin, narrow ring-shaped mi- 
crostrips, shortly rings, as elements; see Fig. 1. The reasons for this choice 
are twofold. A ring is the most simple loop geometry and the modes on this 
geometry can be described analytically. 

The rings are excited by voltage gaps at a certain frequency with corre- 
sponding wavelength A and wave number k. On each ring, the gap is uniform 
with respect to the width and can be positioned arbitrarily. The widths 2bq 
of the rings are all of the same order, but much smaller than the wavelength, 
the radii and the distances between the rings. In other words, khq 1, 
Pq = bq/aq <C 1, and bp/{Lpq — ap — aq) <C 1, where Lpq is the distance 
between the centers Cp and Cq of ring p and g; see Fig. 1. If a ground plate is 
present, the rings are situated above this plate at height h. 





Fig. 1. Geometry of an array of two rings. 



Since the excitation is time harmonic, the electromagnetic field is so also, 
and therefore, a (spatial) time-harmonic representation of this field is used. 
The time dependence is suppressed. The total electric field is written as 
the sum of a scattered electric field and an excitation field The 
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scattered electric field is expressed in terms of the unknown current density 
J on the rings by an integro-differential operator acting on the current, 

j^scat ^ 

The operator C can be factorized as £ = VT, where V is the differential 
operator and T is the integral operator described by a 

Green’s kernel, where TJ is the magnetic vector potentiaL The condition 
that the tangential component of the total electric field is zero at the surface 
S of the rings yields an equation for the current, 

{CJ)tan = -{^ntan, OU S. (2) 

Here, ( • )tan is a trace operator that restricts a vector field to the surface S 
and projects this field tangentially on 5. Equation (2) is solved by a moment 
method expanding the unknown current into a finite number of expansion 
functions with unknown coefficients. Once the current is known, the electro- 
magnetic far field can be calculated analytically. 

3 Analytical Aspects 

In this section, we describe globally all essential steps of the calculation of the 
current J. A publication with more details is in preparation and will appear 
in 2003 as pre-publication in [1]. 

Let the surface of the q-th ring be Sq . On each Sq , a polar parameter rep- 
resentation is chosen, the orientation of which is described by the angle 'ipq] 
see Figure 1. The voltage gap on Sq is positioned in dq — tt. Hence, ^l)q deter- 
mines not only the orientation of the parameter representation, but also the 
position of the gap. Two tangent vectors and correspond in the usual 
way to this representation. Together with the corresponding normal, they 
form a local coordinate system on Sq , which is extended straightforwardly to 
global coordinate system. 

It is assumed that the current Jq = J| 5 g is directed along the centerlines 
of the rings and that it is uniform with respect to the width 6 ^, 

(3) 

The basis of this assumption is that the wavelength is much larger than the 
widths of the rings. Expressing CJq, i.e. the scattered electric field induced by 
ring q, into the system of 5^, we calculate {jCJq)tan on Sp straightforwardly by 
putting the axial coordinate equal to zero and omitting the axial component. 
Then, a differential operator is determined such that 

q)tan\s^ — '^rp'dpi'TJ q\ j (4) 

where [ 5 ^ denotes the restriction to Sp. Hence, the projection of the trace 
operator is incorporated in 
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The impedance matrix component for a test function Vp and an expansion 
function Jg is given by 

<Vp, (£Jg) tan\Sp ^ 5 (5) 

where <•, •> is the inner product on functions defined from S to the tangent 
space of S. It is shown that the variation of {CJq)tan\Sp in Vp is of order I3p 
with respect to its variation in dp. This implies that {CJq)tan\Sp depends 
only weakly on Vp. Therefore, the test function Vp is chosen uniform with 
respect to Vp and tangentially directed only. This means 

Vp(rp,??p) = (6) 

Then, the impedance matrix component (5) turns into 

|5p> — [ I {'^rp'dp[TJq\sp])'dp da{rp^dp). (7) 

'^'dp J Vp 

Neglecting terms of order /?p, we reverse the differential operator and 

the integral with respect to Pp. This leads to averaging of the Green’s kernel 
with respect to the radial source and observation coordinates. In case p = q, 
the averaged kernel has a logarithmic singularity, otherwise it is regular. Re- 
quiring that the test function Vp and the expansion function Wq have square 
integrable second and first generalized derivatives, respectively, we transfer 
the reversed differential operator to Vp. The resulting differential operator 
incorporates the Helmholtz operator. Together with the periodic boundary 
conditions, this operator induces a Sturm-Liouville problem for r’p, the eigen 
functions of which are chosen as test functions. Then, the expansion func- 
tions are chosen equal to the test functions. The resulting impedance matrix 
component is a double integral, which can be reduced to a single integral in 
case p = q hy use of properties of inner product and convolution. 

Choosing a finite number of test and expansion functions on each 5p, 
we obtain an impedance matrix built up of blocks, which describe the self 
and mutual coupling of the rings. The blocks on the diagonal are diagonal 
matrices describing the self coupling of each ring, while the other blocks are 
dense matrices describing the mutual coupling between each pair of rings. The 
expansion coefficients are calculated by an LU-factorization of the impedance 
matrix. 



4 Numerical Results 

The first result we show is used for validation of the implementation. We 
show the real and imaginary part of the current through one ring in free 
space, excited at a frequency of 3 GHz; see Fig. 2. We can compare the result 
with known results from literature: the current through a wire loop excited 
by a voltage gap at the same frequency; see [2, Fig. 2 and 3] and [3]. Here, we 
use a rule of thumb found by Kraus [4, p. 238], which states that the results 
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for a thin strip of width w and a wire with cross-sectional radius w/A are 
equivalent. Two expansion functions show already an accurate result for the 
real part of the current. Four expansion functions show a quite accurate result 
for the imaginary part of the current, except near the voltage gap in 'd = ir. 
The reason for the latter is the following. It can be shown that since the cur- 
rent has a square integrable generalized derivative, the excitation field should 
be square integrable. However, the delta functions that describe the voltage 
gaps are not square integrable. Furthermore, the expansion functions do not 
only have square integrable generalized derivatives, but are even continuously 
differentiable. 

Figure 3 shows the effect of a ground plate. Figure 3. (a) shows the current 
amplitude for a ring in free space, and at/i = A/4,/i = A/2, and h = X above 
a ground plate. The current is normalized on the maximum amplitude in 
free space. Due to interference, the amplitude for h = A/4 is lower than for 
free space, and for h — X/ 2 and h = X higher. Due to space attenuation, the 
amplitude for h = A is lower than for h = A/ 2. Figure 3.(b) shows the far field 
components in the ^z-plane for free space and for h = A/4. Here, a spherical 
coordinate is chosen that is related to the cartesian coordinate system in Fig. 
1 in the usual way. The influence of the ground plate can be observed from 
the behaviour of the 0-component, i.e. the cross-polarization, that vanishes 
at = 90^ (endfire) for the ground plate, but not for free space. 

Figure 4 shows results for two line arrays of 7 identical rings with spacings 
7A/15 and 3A/5. Here, 8 expansion functions per ring is a suitable choice. The 
orientation of the local coordinate systems on the rings is such that 0^ = 0. 
As aforementioned, the voltage gaps, all of equal amplitude, are positioned in 
dq = TT. The centers of the rings are positioned on the positive x-axis, where 
the center of the first ring is in the origin; see Fig. 1. The CPU time of a 
Matlab implementation on a HP PC with Windows NT, an Intel Pentium 
IV processor at 1.0 GHz, and 256 Mb of RAM is 69 sec. Figures 4.(a)-(b) 
show the normalized radiation intensities in the xz and ^z-plane {H and 
FJ-plane) together with the intensity of one ring. In the x 2 :-plane, one main 
lobe and several side lobes are observed for both spacings, where the number 
of lobes is related to the spacing. In the jF-plane, only one lobe is observed. 
These results are in qualitative correspondence with results from literature; 
see [5, Chapter 3]. Besides that the array with larger spacing has more side 
lobes, also its maximal radiation intensity is higher. This effect is due to the 
degree of mutual coupling. Influence of mutual coupling on the current is 
shown in Fig. 4.(c)-(f). For smaller spacing, the amplitudes of the currents 
differ significantly from the amplitude on one ring, while for larger spacing, 
they differ only slightly. In particular, the maxima of the amplitudes for the 
spacing 7A/15 are significantly lower than the maximum for one ring. For 
the spacing 3 A/5, these maxima are both slightly higher and lower than the 
maximum for one ring. The phases differ for both spacings from the phase 
on one ring. 
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5 Conclusions 

We have developed a tool for analysing finite arrays of rings that is easy 
to handle. The algorithm is based on analytic expressions as required. The 
validation has been successful; results have shown to be in qualitative corre- 
spondence with literature and practice. Although numerically, a brute force 
method has been applied, the CPU times are acceptable. However, to anal- 
yse large arrays of about 100 elements or more, they should be reduced. The 
accuracy is sufficient for qualitative analysis. 



6 Prospects 

Research on characteristics of arrays and essential aspects of the antenna 

elements will be topic of further research. A transparent relation should be 

established between excitation, geometry, and scattered field. Finally, feed- 
back should be provided to the hardware designers of Thales Nederland. 
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Fig. 2. The total current on a ring excited by a voltage gap of 1 V at a frequency 
of 3 GHz.; a = 0.0637 A, b = 0.0027 A, c = (0,0,0), and ^ = 0. (a) Real part; 2 
expansion functions of cosine type, (b) Imaginary part; blue, red, black: 4, 6, and 
8 expansion functions of cosine type. 




Fig. 3. A ring excited by a voltage gap of 1 V with 8 expansion functions of cosine 
type; ka = 27 t/ 5 , (3 = 1/40, c = (0, 0, 0), tp = 0. (a) Current amplitudes normalized 
on the maximum amplitude in free space; blue: free space; green: h = A/4; red: 
h = A/2; black: h = A. (b) Far field components in the y^^-plane normalized on 
the maximum of \E^\; blue: free space; black: h = A/4; solid lines: -components 

(co-polarization); dashed lines: FJ<^-components (cross-polarization). 
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(a) (b) 




(c) (d) 




Fig. 4. Results for line arrays of 7 rings of equal radii {kaq = 27 t / 6 ) and widths 
{[3q = 3/50), with voltage gaps of equal amplitude (IV), and positioned in free 
space. 

(a)-(b): Radiation intensities in the xz and y; 2 -plane, resp., normalized on the maxi- 
mum intensity of one ring; blue: one ring; black: spacing 7A/15 (c^ = 7A(g — 1)/15); 
red: spacing 3A/5 (cg = 3A(g — l)/5). 

(c)-(f): Current amplitudes and phases for spacings 7A/15 ((c), (e)) and 3A/5 ((d), 
(f)); blue: one ring; green/red/black/purple: from first ring to center ring. 
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Abstract. The idea of modelling space as two interacting equivalent networks, one 
for currents, one for magnetic fluxes, pervades computational electromagnetics since 
its beginnings. The Yee scheme, the TLM method, can thus be interpreted. But 
this is also true of finite element- or finite volume-inspired more recent proposals, as 
we show, so the idea is not incompatible with “unstructured” meshes. Yet, meshes 
with some rotational and translational symmetry (locally, at least) are desirable 
on many accounts. The tetrahedral Sommerville mesh we describe here, able to fit 
curved boundaries and yet regular, looks like an interesting compromise. 



1 Introduction 

Sommerville [3], working in the Twenties on the problem of space-filling tetra- 
hedra (the story is well told in [2]), found one which is especially interesting 
in the context of modern FDTD-like methods. 

Such Yee-like methods [4] [5] retain the essential features of FDTD, i.e., 
they can be interpreted as the formation of two interlocked equivalent net- 
works, one magnetic, one electric, but the underlying pavings need not be the 
two staggered cubic lattices of Yee’s scheme. They can be “cell-complexes in 
duality”, meaning there is a primal mesh, made of nodes, edges, etc., of a 
paving by convex cells of any shape, and a dual mesh, made of dual volumes, 
dual facets, etc., in one-to-one correspondence with the primal ones. Degrees 
of freedom are magnetic fluxes, emf’s, relative to primal cells, and mmf’s, 
currents, relative to dual ones. “Network” equations, which are nothing else 
than a generalization of Kirchhoff’s laws, are then set up, in a way which 
is essentially unique. But they must be supplemented by “network constitu- 
tive laws”, in order to couple emf’s with currents, and mmf’s with fluxes. 
These laws, which are encoded in square symmetric matrices v and 6, the 
size of which is the number of primal facets and edges respectively, must 
be constructed in analogy with the constitutive laws H — vB, D = eE (or 
J = aE). 

But such analogies are not that compelling, and in contrast to network 
equations, network constitutive laws come in many flavors, almost as many 
as investigators. For instance [1], the use of edge elements in a Galerkin varia- 
tional approach does result in such laws, but with the drawback of producing 
non-diagonal u and e matrices, while efficiency in the simulation requires 
diagonal matrices, in order to have an explicit time evolution scheme. “Diag- 
onal lumping” procedures can help in this respect, but their implementation 
still raises many problems. 
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Another approach (which on the face of it doesn’t seem to require finite 
elements y but holding this view would be wrong) consists in building “mu- 
tually orthogonal” meshes: each primal edge pierces its associate dual facet 
at right angle, etc. It’s then straightforward [4] [5] to get diagonal /x and e 
matrices. But whereas the Galerkin approach is available on any reasonably 
behaved primal mesh, very few primal meshes will allow construction of an 
orthogonal dual. This condition seems to impose a kind of rigidity on the 
primal mesh which severely constrains its topology and its metric. The cubic 
lattice of Yee’s scheme satisfies these constraints, and still does after some 
distortion. But the notorious “stair casing” problem plays against it. Hence 
the search for more fiexible substitutes. The Sommerville mesh, based on the 
first Sommerville space-paving tetrahedron, promises to be one. 



2 Yee-like methods 

The geometric approach to electromagnetism stems from the remark that 
all its observables are integrals over some manifold of dimension p (a line if 
p = 1, a, surface if p = 2, etc.). For instance, the electric field is known via 
electromotive forces (emf’s), which are line integrals such as • E, where 
c is a curve with unit tangent vector r, and E the electric vector field. Let’s 
denote by e(c), or for better mnemonic value, by e, this emf. This enhances 
the status of the electric field as a mapping, from curves to real numbers (with 
obvious properties of additivity and continuity), assuming the value e when 
applied to c, a map we denote by the symbol e. What counts, physically, is 
this mapping e, not the “proxy” vector field E by which it can be expressed, 
once a metric-defining scalar product has been introduced. A similar argu- 
ment shows that magnetic induction is a map, denoted 6, from surfaces to 
reals, whose values b{S), again better denoted by are induction fluxes. 
Mappings of this kind are called “p-forms” in differential geometry, where p 
refers to the dimension. 

In this spirit. Maxwell’s equations become differential relations between 
forms, as follows: 



dt f b f e — 0, (1) 

Js JdS 

-dt [ d+ f h= f j, (2) 

Je JdE Je 

where d denotes the boundary, for all surfaces S and E. (The 5 vs 17 no- 
tation is meant to stress a difference in the way these surfaces are oriented: 
each S has “inner” orientation [4], i.e., a specified clockwise gyration sense, 
which matches the orientation of the boundary dS, whereas E has “outer” 
orientation, i.e., a crossing direction through it, which matches the way to 
“turn around” dE. See [1] for more details.) 
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The basic discretization move, then, is to satisfy (1) and (2), not on all 
surfaces S and i7, but on all those (in finite number, if one works in a bounded 
region) built from facets of suitable meshes. 

So let m be a meshing of the computational domain D, made of volumes 
V (indexed over the set V), which have in common facets / (indexed over ^), 
which hinge along edges e (indexed over S), which meet at nodes n (indexed 
over Af). Each of these p-cells is given an orientation of its own, hence the 
definition of incidence numbers (e.g., between edge e and facet /), which 
say whether two cells of dimension p and p-\-l meet (otherwise, the number 
is 0), and if they do, whether their orientations match or not (e.g., = ±1 

depending on whether e goes along the gyratory sense attributed to / or 
counter to it). Other incidence numbers and are similarly defined 
and form (rectangular) matrices G, R, D. It’s easy to see that RG = 0 and 
DR = 0. 

Let’s now decide to represent b and e by (time-dependent) arrays b = 
{hf : f e T} and {eg : e E f} of “degrees of freedom” (DoF), interpreted 
as fiuxes and emf’s relative to the individual facets and edges of m. It’s 
an approximate representation, in the sense that it tells about fluxes [resp. 
emf’s] relative to “m-surfaces” only, i.e., those made of facets of m [resp. to 
“m-lines”]. To satisfy (1) for facet / means, as one will easily realize, 

dt^f T ^ ^ Rj^e = 0, 

and this, if true for all /, enforces (1) for all m-surfaces, by additivity. Hence, 
in matrix form, a first group of equations between DoF arrays: 

d^b T Re — 0, (3) 

a spatial discretization of (1). (Time-discretization will be straightforward, 
and we gloss over it.) In search for an analogous group in similar relation to 
(2), we notice that DoF’s for h and d (or j) should “sit at the same place” 
as those for b and e, owing to the local character of the constitutive laws 
b = /ah and d = ee (or j = ere). Hence the decision to consider a dual mesh m, 
whose respective nodes, edges, etc., are in one-to-one correspondence with the 
volumes, facets, etc., of m: a dual node inside each primal volume, a dual edge 
piercing each primal facet, and so forth. (Notice this is exactly the situation 
with the staggered grids in FDTD.) This way, the incidence matrices of the 
dual mesh appear to be the transposes D^,R^, G^ of the primal ones. Then, 
again, DoF arrays h = {hf : f G !F} and d = {de:eG^} approximately 
represent h and d, data about j translate (by integration over the dual facets) 
into a known time-dependent array } = {}e ‘ ^ ^ and a second group of 
equations is obtained: 

-d*d + R‘h=j. (4) 

Note the almost compulsory nature of this discretization process: All we have 
done is, enforce (1) and (2) for all m-surfaces S and all tn-surfaces U in the 
only possible manner. 
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By duality of the two meshes, the form of the discrete constitutive laws 
is h = i/b and d = ee, where v and e are square matrices. How to construct 
them is the core of the discretization problem. It can be solved in essentially 
two ways, the ‘^orthogonal construction” and the “bary centric construction”, 
which give birth to a lot of variants. 

First, one may in some cases have two “mutually orthogonal” meshes. By 
this it is meant that not only all cells, primal and dual, are straight (i.e., for 
p = 1 or 2, contained in a line or a plane), a requirement not made so far, 
but that a pair of mated cells are orthogonal: the dual edge / is orthogonal 
to facet /, etc. (Fig. 1), and that their orientations (inner for one, outer for 
the other) match, too. Then a very natural way to build v and e is to have 
them diagonal with entries 

= z// length(/)/area(/), = Cg area(e) /length (e), (5) 

where Uf and Cg are the local values of u and e, if these are smooth functions 
of position. (The case of discontinuous ones is hardly more difficult, see [1].) 
With such matrices and a leap-frog scheme with time step St small enough 
(for stability), (3) (4) plus the “network constitutive laws”, 

h = i/b, (6) 

d = ee, (7) 

give birth to an explicit forward-marching integration scheme, which is noth- 
ing else than FDTD in the case of a cubic lattice-like mesh and its dual. 




Fig. 1. (one dimension suppressed). Dual edges in the orthogonal construction (left) 
and the bary centric one (right) 



Actually, (5) is a way to enforce, in the orthogonal case, a “consistency 
criterion” of wider scope, namely 

Y f = Y 

f'€:F e'€S 



(8) 
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where the on-the-line /', e' and /, e', stand for vectors: the vectorial area 
of facets /' and e', the vector joining the end points of edge /, the vector 
along e'. It can be shown (this is where finite elements are necessary, as 
a way to reconstruct fields from DoF arrays, and thus make a comparison 
between the exact and the approximate possible) that (8), which mixes the 
entries of v and metric information about the mesh, is a necessary — and, 
under reasonable complementary assumptions, sufficient — condition for the 
convergence of the above scheme. 

Designing numerical schemes for the Maxwell equations, in this approach, 
thus reduces to building a pair of meshes in duality and matrices u and e 
which satisfy (8). (Given the pair of meshes, such matrices always exist. The 
difficulty is to have them symmetric.) Remarkably — and this is the second 
way to solve the discretization problem — the so-called “mass matrices” of 
facet elements and edge elements on a tetrahedral mesh, when taken as u and e 
respectively, happen to satisfy (8) when the dual mesh is the bary centric one. 
The Galerkin approach with Whitney elements, therefore, does not essentially 
differ from the Yee scheme and its modern avatars [4] [5]: it just trades the 
inconvenience of having non-diagonal matrices e and /x (this can be alleviated 
by “mass lumping” techniques) for easiness in building the dual mesh. 

This is a genuine trade-off, for the “orthogonal dual and diagonal matri- 
ces” approach is not that straightforward: As Fig. 2 should make it plain, 
only a very limited family of primal meshes will admit suitable duals in the 
required mutual orthogonality relationship. On the other hand, mutual or- 
thogonality, because of diagonality that comes with it owing to (8), is a very 
desirable feature, which offers an attractive interpretation of (3) (4) (6) (7). We 
now turn to that. 




Fig. 2. Mutual orthogonality is easily achieved with some meshes (left), but reg- 
ularity is not enough in this respect, even in dimension 2 (right). Odds are rather 
small that an arbitrary primal mesh, even good-looking, admit an orthogonal dual. 
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3 Interlocked equivalent networks 

Suppose b = 0 and e = 0 at time ^ = 0, and let q stand for the node-based 
array of electric charges defined by q(t) == — G^j(s) ds. Then (3) and (4) 
imply Db = 0 and —G^d = q. Displaying the equations like this (// is the 
inverse of i/): 

G^d = -q, d = ee, Re = — d^b, (9) 

Db = 0, b = /ih, R^h=j-hd^d, (10) 

with a symmetry which would be even more striking if we added (non- 
physical) magnetic currents and charges, we see that (9) (10) describe two 
networks, subject to Kirchhoff’s laws, each one exciting the other. Equa- 
tions (9) rule the primal, or electric network: G^d == — q expresses charge 
conservation at nodes. Re == — d^b is the loop law, with loop emf’s due to 
flux variations at the right-hand side, and d == ee links branch emf’s with 
branch (displacement) currents. Sources are q, a given, and d^b, broadcast 
by the other network. The latter, magnetic, also has its node law, Db = 0, 
its branch permeances (the entries of /x), and its loop law, with a source mmf 
provided, concurrently, by the given currents j and the displacement currents 
communicated by the first network. A lot of variations are possible on this 
basic theme, including the coupling with outside circuits, lumped elements, 
etc., which come very naturally. 

Of course, the suggested interpretation of /x and e as branch permeabilities 
and permittivities is a bit blurred when these matrices are not diagonal: 
It all goes as if, for instance, flux in “branch” / (which one can construe 
as going through facet /) depended not only on the mmf hy in this very 
branch, but also on mmf’s associated with the nearby dual edges. Hence the 
interest for mesh-design methods that would give the best of both worlds: 
Easy construction of an orthogonal dual, on the one hand, and better fitting 
of curved boundaries, which is what tetrahedral volumes are felt useful for. 

4 The Sommerville mesh 

The Sommerville mesh is very promising in this respect. Its generating vol- 
ume, here referred to as “the sommerville” (Fig. 3), is a tetrahedron with two 
edges of length a and four of length 6, with 3a^ = It paves space (Figs 
4 and 5), forming a primal mesh which happens to be the Voronoi-Delaunay 
one associated with a body-centered cubic lattice (hence the orthogonality of 
the dual mesh). Note the octahedra are not regular, here. They are slightly 
squashed along the vertical direction. So this is not the standard “octet truss” 
tessellation, with, in two-to-one proportion, regular tetrahedra and octahedra 
(the latter, cut into four parts). But remarkably, it is an even more symmetri- 
cal paving. Although there seems to be, at first glance, a privileged direction 
(vertical), along which center masts would align, one soon realizes that so is 
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not the case. Let’s call “beams” the edges parallel to the natural Cartesian 
axes, and “struts” the slanted edges. Each beam, not only each vertical one, is 
surrounded by four sommervilles forming a squashed tetrahedron — but now 
squashed along the central beam’s direction. The network, therefore, is as 
“isotropic” as can be achieved, with a correlative improvement of numerical 
dispersion relations with respect to FDTD. 




Fig. 3. Four tetrahedra like the one in the rear can be assembled around the vertical 
pillar to form an octahedron (not a regular one). If c = a, i.e., if 3a^ = 46^, we 
have a Sommerville tetrahedron there. Point o marks the position of the center of 
the circumscribed sphere, well inside the tetrahedron (even if the latter is slightly 
deformed). 




Fig. 4. Thanks to the 3a^ = relation, two additional Sommerville tetrahedra (up 
right and bottom left) complete the octahedron to make a space-filling hexahedron. 
The sommerville is therefore a space-filler. 



The dual mesh is provided by the orthogonal construction: Take the 
circumcenters, and join them. Dual edges obtained this way are automat- 
ically orthogonal to primal facets, and the other way round. (Moreover, they 
do meet each other, which is not warranted in the more general Voronoi- 
Delaunay construction.) Like the primal mesh, this one is also a paving by a 
single space-filler, namely, the truncated octahedron (Fig. 7), one of the so- 
called Archimedean polyhedra (also known as “tetrakaidecahedron” , which 
would already by itself justify the visit). But the staircasing problem, though 
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Fig. 5. Indeed, one may stack the hexahedra thus obtained, which amounts to 
combine octahedra and tetrahedra in the familiar “octet truss” pattern: First lay 
the octahedra side by side, then add sommervilles, two for each octahedron, thus 
obtaining a horizontal egg-crate shaped slab, with pyramidal holes, ready to be 
filled by a similar slab, superposed, and so on. 




Fig. 6. A few dual facets (bounded by dual edges, themselves obtained by connect- 
ing the circumcenters of adjacent tetrahedra while turning around a given primal 
edge). Those around beams are squares, those around struts are hexagons. 



still present, is much less acute. The mesh tolerates distortion, like the cubic 
one. Refinement is easy, as a sommerville splits into eight smaller ones. 

Implementation is not difficult. One first sets up a cubic lattice, nodes of 
which are numbered with even integers, the generic node thus being {2p, 2q, 2r} 
A second, staggered lattice with odd nodal coordinates is then added. Edges 
of both grids are our beams. Now join each “even” node to the eight “odd” 
nodes around it: here are the struts. It’s convenient to label edges (and 
for that matter, all cells) by the coordinates of their mid-points. This way, 
{2p, 2q -f 1, 2r} points to a beam, {2p ib 1/2, 2q ± 1/2, 2r ib 1/2} to a strut, 
etc. This makes it easy to span the set of edges which “interact” , in the dis- 
crete analogue of the curl-curl equation, with a given one, i.e., those edges 
which share a facet with it. Two struts interact, in this sense, if only one of 
their “coordinates” differ, by ±1. A strut and a beam interact if the three 
coordinates differ by =bl/2, etc. Thanks to such rules, the network equations 
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can be formed “by hand” , in easy finite difference fashion, as with the Yee 
scheme. 




Fig. 7. The dual volume is a truncated octahedron, a known space filler. There is 
one around each primal node. 
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Abstract. A novel substrate coupling simulation tool named SubCALM is pre- 
sented. It is well suited to floorplanning of large mixed-signal designs since it ex- 
ploits the boundary element method and contains a Poisson solver based on a 
hierarchical O (n) conjugate gradient algorithm. Sophisticated preconditioners are 
applied, which further increase the computation speed by a factor of about 10. The 
approach is verified by experimental results in a 0.25 //m BiCMOS technology. 



1 Introduction 

Modern mixed-signal IC designs consist of an increasing number of analogue 
and digital subcircuits due to the ongoing trend to higher integration. This 
enables digital interference to couple to sensitive analogue nodes through the 
substrate [1]. Typical substrate structures of a BiCMOS technology are shown 
in Fig. la. Most of the substrate noise on chip level is coupled via low-ohmic 
substrate contacts and the power supply network [2]. Hence floorplanning 
software tools should be provided in order to estimate how a certain place- 
ment of subcircuits and a power supply strategy influence substrate coupling 
effects in a given mixed-signal design. Traditional substrate coupling simula- 
tion tools do not meet this demand since they simply perform an impedance 
extraction of a flattened layout [3]. 
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Fig. 1. (a) Typical substrate structures in BiCMOS integrated circuits, (b) Defi- 
nition of layers in a BiCMOS technology. 
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Therefore, a novel simulation tool named SuhCALM hdiS been developed. 
It is well suited to floorplanning of mixed-signal designs due to its hierarchical 
approach and capability to process large layouts. It exploits a substrate model 
based on the boundary element method (BEM) as explained in Sect. 2. The 
subsequent section presents a new approach how to incorporate wells in this 
BEM model. The proposed algorithm may be accelerated by preconditioners 
(Sect. 4). Simulation and measurement results are shown in Sect. 5 for a 
0.25 iim BiCMOS technology. 

2 Substrate Resistance Extraction 

The substrate of a CMOS, BiCMOS or bipolar technology may be seen as a 
semi-inflnite halfspace of silicon with different layers of resistivity depending 
on the doping density level, like channel stoppers, epitaxial layers and the 
bulk as shown in Fig. lb. In most cases time-variant magnetic fields and 
displacement currents can be neglected for frequencies below 10 GHz leading 
to a simple electrostatic current flow problem between m contacts [4]. The 
admittance of an m-contact geometry can be summarised by a symmetric 
m X m admittance matrix Y. 



2.1 Boundary Element Description 

SubCALM exploits the boundary element method [5] for solving such an elec- 
trostatic problem. Choosing the BEM leads to a smaller number of variables, 
which is more suitable for huge problems, although it is clear that BEM does 
not reach the accuracy of finite element methods [4] . 

If a current with a certain density distribution J (ro) is injected into 
contact j, the potential 0 at a certain point r inside the substrate may be 
calculated as 

(j){r) = [ G(r,ro) J (ro) dro (1) 

Jrj 

with Fj equals to the surface of contact j. G(r,ro) is the modified Green’s 
function which accommodates the special problem of a semi-infinite layered 
halfspace. G simply expresses the potential at a certain point r, if a unit 
current is sourced into an infinitely small point ro- 



2.2 Green’s Function for Layered Media 



Inside the substrate the Green’s function G has to satisfy 

(p(^) 



V 



V G{r,ro) ) = -S{r - tq) 



( 2 ) 
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where p (r) denotes the resistivity at point r. Due to radial invariance sep- 
aration of variables in cylindrical coordinates may be used to simplify (2) 
to 

1 

G (r, z,zo) = — Z {k, z, zq) Jq {kr) dk (3) 

Jo 

Jo() is the Bessel function of the first kind of order 0. G has to meet several 
boundary conditions that can be used to determine Z: 

1. Homogeneous Neumann boundary at 2 : == 0 

2. Z {k,z,zo) and continuous at layer interfaces 

3. lim;^_>oo Z {k, z, 2 : 0 ) = 0 

4 . At injection point zq: ~ 

Considering these properties G may be calculated according to (3). This is a 
one-dimensional Hankel transform, which can be evaluated by using a Fast 
Hankel transform algorithm [6] as Z is smooth in A:-domain. 

Figure 2 presents a Green’s function G of the given arbitrary layer struc- 
ture between contacts located at the surface 2 ; = 0. This example reveals 
some important properties of G. The electric field near the injection point 
has a simple 1/r-relationship which is similar to the field of a single layer 
halfspace with p — pi. However, if the distance gets bigger, the current does 
not penetrate the thin high-ohmic layer 2 underneath. Hence G changes to 
a log (r) -relationship up to certain distance r, where the interface area r^Tr 
to the underlying layer 3 becomes that huge that it is now inevitable for the 
current to flow along the shortest possible path through layer 2 into the low- 
ohmic layer 3. If distance is increased further, the 1/r-relationship returns 
with p — p^. 




Fig. 2. Green’s function for an semi- infinite layered halfspace. 



The same arguments may be used in order to derive some major prop- 
erties of the substrate coupling behaviour of the BiCMOS technology shown 
in Fig. lb. Guard rings will be quite effective in this technology since the 
resistivity between adjacent substrate contacts is low, whereas the coupling 
between distant ones is smaller due to the less doped bulk. 
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2.3 Accelerated Resistance Calculation 



We use Galerkin’s approach in order to solve (1) numerically for J. Each 
contact is subdivided into smaller rectangles assuming constant potential and 
constant current density across the rectangle. For each rectangle an equation 
is derived which relates the known potential (j)i of rectangle i to the contri- 
bution of injected currents Ij into all rectangles j of all contacts. If there 
are n such rectangles, this results in a dense linear system Zi — v with an 
impedance matrix Z G a vector i G containing all currents /j and 

a vector u G consisting of potentials 0^. Each entry of Z yields 



= XX / / G{ri,rj)drjdn 

i 3 J Ti J r j 



(4) 



where Fi is the surface of rectangle i with area Ai and Fj denotes rectangle 
j with area Aj. The admittance matrix Y G may be subsequently 

determined with Y = Z~^ P where P G {0,1}^^’^ with Pij = 1 if 

rectangle i is part of contact j. Direct methods such as Gaussian elimination 
require 0 (n^) operations solving this equation. Iterative algorithms like con- 
jugate gradient (CG) usually require O (n^) per iteration. Several different 
algorithms have been proposed in the literature, which offer a subquadratic 
runtime [7,8]. Sub CALM is based on the hierarchical algorithm by Appel [9], 
with O (n) time consumption [10]. 

If the kernel of (4) is 1/r, two rectangles may be combined ii R/r < ^ [5], 
where R is the length of the longest side of the area covered by these two 
rectangles, e is a certain error and r is the distance to the injection point. If 
the kernel is arbitrary as in our case, this expression has to be generalised to 
L (r, e) > R. L is derived from G as shown in Fig. 3a. Two rectangles may be 
combined together, if G can be linearised within the whole area. If G were 
totally linear, an evaluation of G at the centre of gravity would minimise the 
average integration error over Fi and /}. An appropriate maximum value for 
the linearity interval L (r, e) with respect to distance is determined in such a 
way that the approximation error is limited to a given relative tolerance e. 
The application of such a simplification to a layout is visualised in Fig. 3b. 



3 Modelling of Wells 

Well structures cannot be modelled by the BEM easily. However, it becomes 
possible by modifying the calculation of Z. A simplified well structure is 
shown in Fig. 4a. The pn-junction between wells is replaced by special ‘well 
contacts’. Since this pn-junction is reversely biased, its junction capacitance 
Cj has to be considered as well. Two different potentials and are 
dedicated to each well contact depending from which side it is looked at. 




Green’s function G 
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Fig. 4. Modelling of wells in the BEM algorithm. 



Considering only well i, all subwells i + 1 are removed first. In addition, a 
high ohmic layer is introduced instead of parent well i — 1 as shown in Fig. 4b. 
Now of this modified structure is calculated using (4): 



Cw,i / 



7' Z' Z' Z' 
^21 ^22 ^23 

^31 ^32 ^33 ^34 

V ^41 ^42 ^43 ^44 / 



( 1 
-^r,i 
^c,i 

\ / 



(5) 



Since a well is a closed volume, has to hold. Hence 

^weii over-determined. The variables dedicated to the reference contact 
Vr^i and 7r,i are removed leading to Z^eii with entries The influence 

of the junction capacitances can be considered as follows. 
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(6) 
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This results in an impedance matrix Zweii,i for every well. 
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All these matrices can be summed up in a symmetric impedance matrix Z 
for the whole system. 
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The above approach can be easily extended to more contacts per well. All 
entries in and Z^eii except those for reference contacts will convert to 

submatrices. A modified CG-algorithm [11] has to be used in order to invert 
the complex symmetric, but not hermitian system impedance matrix Z. 



4 Preconditioner for CG- Algorithm 



The convergence rate of the CG-algorithm depends on the ratio between the 
maximum and the minimum eigenvalue of Z. If a Z with well structures is 
evaluated for low frequencies, it can become ill-conditioned. Hence the usage 
of a preconditioner is recommended. Although Z is dense, several different 
possibilities for a preconditioning matrix M still exist: 



1. DIAG: Only diagonal elements of Z are considered. M — diag(l/Zii) 

2. BLOCK-DIAG: Internally, adjacent rectangles are stored inside one rect- 
angle object in a binary tree structure [12]. Now only entries Zij are 
considered, if rectangles i and j are located in the same rectangle object. 
Hence diagonal blocks Z^ of adjacent entries inside Z are chosen to form 
Mfe = diag (Zfe 

3. BLOCK-BAND: In addition to BLOCK-DIAG, parts of Z defining the 
coupling between adjacent rectangle objects are taken into account as 
well. Let Zfei and Z ^2 be the diagonal block matrices for rectangle objects 
kl and k2 and Zh\,k 2 tbe coupling block matrix between the rectangles 
of these two objects. 



/ ^11 Yi2\ _ ( Zkl Z^kl,k2\ 

\Y^ 2 Y 22 j ~ \Zkl,k 2 '^ Zk 2 J 



(9) 



Yi 2 will be used as Mki,k 2 - The diagonal elements must be modified 
to Mil — Mij to make sure that M is positive definite. Table 1 

presents acceleration results due these preconditioning measures. 
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Table 1. Number of iterations (It) and runtime (t in seconds) of the CG-algorithm 
with different preconditioners applied to problems with different number of vari- 
ables (NoV) at certain frequencies f, carried out on a Sun Fire 4800 with Ultra- 
SparcIII 750 MHz processors and 24 GB memory. 
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5 Simulation and Measurement Results 

A testchip in a 0.25 (xm BiCMOS technology with a layer structure shown 
in Fig. lb has been designed and manufactured. It contains various different 
guard rings and well structures both for DC and RF measurements. Results 
of the DC part have already been published in [12]. Structures for RF were 
measured between 100 MHz and 20 GHz with a network analyser HP8510C. 
The LRRM method was used for calibration. Measurement results of an n- 
well structure are presented in Fig. 5. The influence of the voltage dependent 
pn-junction capacitance may be easily seen. The difference between mea- 
surement and simulation is less than 5dB, which is sufficient for a coupling 
estimation. 
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Fig. 5. RF measurement and simulation for a well structure. 



In addition, simulated data for coupling from substrate into a p-well of a 
triple well structure is shown in Fig. 5. It was impossible to measure this small 
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coupling on the testchip. However, simulation results for every pn-junction 
being involved (substrate to n-well and n-well to p-well) were verified by 
measurement. Hence one can assume that coupling into a triple well is at 
least another 20 dB smaller than into a single well. 

6 Conclusion 

A hierarchical substrate simulation tool has been presented, which offers the 
opportunity of substrate coupling estimation during the process of floorplan- 
ning. A novel approach of a well structure model for boundary element de- 
scription has been proposed. An accelerated conjugate gradient based algo- 
rithm is used which is capable to deal with large mixed-signal design layouts. 
The application to a real-world mixed-signal design will be part of investiga- 
tions in the near future. 
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Abstract. Widely seperated time scales appear in many electronic circuits, making 
analysis with the usual numerical methods very difficult and costly. In this article 
we present a quasilinear system of partial differential equations (PDE) of first order, 
where the time scales are treated seperately. The PDE corresponds to the system of 
differential-algebraic equations (DAE) describing the electronic circuit in the sense 
that the solution of the PDE restricted to one of its characteristics is the solution 
of the DAE. This embedding method is described in a general setting. Hence it can 
be used for various applications in circuit simulation. 

Since generalized quasiperiodic functions, which are presented here, conceptu- 
alize physical properties, they have a basic significance for the embedding method. 

Theoretical investigations are presented as well as new approaches for numerical 
methods based on the connection between the PDE and the DAE. 



1 Introduction and Motivation 

An electronic circuit is simulated by solving a system of differential algebraic 
equations (DAEs), which is derived via the modified nodal analysis from 
Kirchhoff’s laws. The system is of the form 

J^q{x{t)) + + h{t) = 0, (DAE) (1) 

with node voltages and branch currents of impedant elements x : IR -> IR^, 
charge and flux terms of impedant elements q : IR’^ IR^, static currents 
/ : IR’^ -> IR’^ and input signals b : IR IR^, where the stimulus b{t) is 
constant for autonomous circuits and time-dependent otherwise. Because the 
derivative of q is in general not invertible, there are not only dynamical but 
also algebraic conditions and the system is a proper system of DAEs. 

Electronical circuits often have solutions consisting of oscillations with 
widely separated time scales. Mixers for example have solutions which are 
superpositions of fast and slow oscillations requiring very tiny stepsizes over 
a long time interval when calculating the solution numerically. Switching 
to a multivariate function we can seperate the time scales. If e.g. for the 



W. H. A. Schilders et al. (eds.), Scientific Computing in Electrical Engineering 
© Springer- Verlag Berlin Heidelberg 2004 




An Embedding Method for High Frequency Circuits 147 



ring modulator described in [7] we choose artificially extreme input sig- 
nals ?7ini(^) = 0.5sin(20007r^) and U\n 2 {t) = 2 sin(2000007rf), the frequen- 
cies differ by a factor of 100. The solution for one of the node voltages is 
shown in Fig.l a). Assuming that this is a quasiperiodic function of the 
form x{t) = Xz with G IR and uji = 20007r,o;2 = 

2000007T, we can associate a function in two variables of the form x{ti^T 2 ) — 
^^^^2 g^( 2 :ir 2 + 22 r 2 ) ^his function is 27 t— periodic in both dimensions. If 

we know S(ti,T 2 ) on [0, 2'k] x [0, 27t] we know it on all of H^. Given the path 
7 (f) (cL;if,a; 2 f) in IR^ we see that x{'^{i)) = x{t). Note that 7 (f) has a very 
steep slope and is close to the T 2 — axis. If we had J(ti,T 2 ) on [0, 2'k] x [0, 2k] 
we could easily retrieve x{t). 

We could try to compute such a bi-periodic function as a solution of a 
suitable system of partial differential equations (PDE) associated with the 
initial system of DAEs. The PDE has to be such that from its solution we 
can retrieve the solution of the DAE by following a specific path. Solving 
the DAE via a numerical solution of the associated PDE can decrease the 
costs drastically, because we have to compute the solution of the PDE only 
on [0, 2k] X [0, 27t], where it is typically a slowly varying function and can be 
approximated using normal or large stepsizes. We call this procedure ’’embed- 
ding the system of DAEs into a system of PDEs” . For the above mentioned 
ring modulator it leads to a system of PDEs which gives the bi-periodic 
solution on [0, 27 t] x [0, 27t] in Fig.l b) corresponding to the node voltage 
considered on the left. 




Pig. 1. a) Reference solution; b) Solution as a multivariable function 



Various embedding methods have been presented in the literature (see [2], 
[ 6 ], [5], [13], [4], [ 8 ], [10], [12], [14], [3], [11] and [15] for a survey). They differ 
in the form of the embedding PDE, the path of evaluation and the considered 
domain, depending on the objective of the simulation and the assumptions on 
the special form of the solutions. Here we present a unified approach for the 
different derivations, such that they can all be considered as special cases of 
this general embedding method. The form of this general embedding clearly 
points out the important role of the characteristic lines for this approach. 
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The relation between the PDE and the initial DAE system can be used 
to develop new numerical methods to solve the DAE and to give new views 
on existing methods. Here we suggest a numerical method which stems from 
a method of characteristics for the PDE, specialized such that it results in a 
DAE method which works only with the original DAE problem, approximat- 
ing only every k-th oscillation. 

In the following generalized quasiperiodic functions are introduced, the 
embedding method is described and theoretical results are given. Applica- 
tions of this embedding method are presented and numerical algorithms are 
considered. 



2 Generalized Quasiperiodic Functions 



Consider functions x : IR JEC^ of the form 









m 



e 



( 2 ) 



where Xz{t) : JR — > and uj = with a; : IR -> IR"^. 

We call them generalized quasiperiodic functions. If the factors Xz{t) 
and ujj{t) do not depend on t, they are just the well-known quasiperiodic 
functions. Throughout this paper these sums are assumed to be finite. 

Now let X be a generalized quasiperiodic function. With lo from the gen- 
eralized quasiperiodic representation of x above we can build for a given 
interval / C ]R a path 7 in IR"^ as: 



j{t) := : I IR"". 



To each generalized quasiperiodic function x we associate a function x with 
multidimensional domain of the following form: 



?(' 



-)= E 

ze'Z.'^ 



X.{t) 



«( E ^jTj) 



where r = (ri, T 2 , . . . , r^) G IR"^ and the Xz{r) satisfy 

Xz{^{t)) = Xz{t) for all (3) 

Then x{j{t)) = x{t). We denote such a function by: multivariate function as- 
sociated to the generalized quasiperiodic function. Because different functions 
Xz could satisfy the property (3) above, an associated multivariate function 
is in general not unique. It depends on the choice of Xz • 

Note, that if the Xz ctnd ojj are constant the associated multivariate func- 
tion is 27T-periodic in each coordinate by construction. 
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3 Embedding Method and Theoretical Investigations 



For the circuit equation (1) the corresponding embedding system, a system 
of partial differential equations, is defined by 



m 



J=1 



dTj 



q{x{T)) + f(x{T)) + 6(r) = 0, 



(PDE) 



(4) 



where x : i? — > (i? C IR"^, i? open and connected), r = (ri, . . . , Tm) C i7, 

aj : i? — ^ IR, o;j(r) > 0, m G IN; in practice often m < 3. q and / can be 
taken directly from (1)), but b must be chosen such that if evaluated along 
the path 



7:7->IR’" = (71 (i), 72 ( 0 , ■ • • , 7m(i)) (5) 

it coincides with h (/ C IR interval with 0 G /), i.e. = b{t). The path 

7 is chosen to be the solution of the initial value problem 



ij{i) = ocj{l{t)), fiir alle (6) 

7(0) = 0 e IR™. (7) 

Equation (4) is a system of quasilinear partial differential equations of first 
order. If Dq, the Jacobian of q, is invertible, the equation can be decoupled 
(see [12] for a special case). Supposably this proof can be generalized to 
problems of index 1. 

Then the theorie of characteristics is applicable. It can be shown that the 
caracteristics do not intersect. The problem of existence of solutions of the 
PDE can be reduced to the existence of a family of solutions of the DAE 
with appropriate stimulus. Because the characteristics do not intersect, only 
a ’’blow up” can prevent the existence. Conversely the existence of a PDE- 
solution implies the existence of the DAE-solution. 



4 Applications 

Various embedding methods have been presented in the literature, but they 
differ in the partial differential equation, the path of evaluation and the con- 
sidered domain. The approach above allows a general derivation and provides 
a theoretical framework. In the following the results are summarized. 

If we specify the coefficient functions j = l,..,m, the path of eval- 
uation is determined. Together with the specification of the domain of the 
PDE various circuit simulation problems can be treated. These are e.g. steady 
states of non-autonomous and mixed autonomous/non-autonomous circuits 
with quasi periodic stimulus (see [1], [2], [3], [6], [10], [14]). Also transient behav- 
ior (see [9], [13], [14]) as well as steady state analysis for autonomous circuits 
with quasi periodic steady states can be treated. Furthermore one can choose 
an embedding system to analyze the transient behavior of high Q oscillators 
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(see [2], [4]). In [3] also an embedding system for the study of ’’injection lock- 
ing” is presented. 

Several numerical methods are considered in the literature as well. Es- 
sentially these are ’’multi-tone harmonic balance” and standard methods to 
solve the PDE directly. 



5 Numerical Methods 



The following algorithm, which is a modification of the method of character- 
istics presented in [10], calculates the steady state solution of electric circuits 
with two superimposed oscillations. It is based on the described embedding 
method and utilizes the circuit equation directly. The system of partial dif- 
ferential equations provides periodicity conditions on time intervals where 
the circuit equation is evaluated. At first the frequencies are assumed to be 
known. Then an extension to the case of unknown frequencies is discussed. 

We consider steady state solutions of the form 

x{t) = Y, (8) 

with 0 J 2 ^ oji both known. The solution is embedded in the 27r-biperiodic 
function 



x{t) = Y e^(z^T^+z^r^) ^ ( 9 ) 

which returns x when evaluated on the path 'j{t) = (o;i t, 0 Ji t). The circuit 
is described by the differential-algebraic system (1) which corresponds to the 
system of partial differential equations 

Wi ^ q(x{T)) +UJ2 9(2(t)) + /(j(r)) + 6(r) = 0. (10) 

The PDE is considered on the domain [0, 27t) x [0, 2tt) with periodic bound- 
ary conditions. The evaluation path 7 is projected on this square, see Fig. 2, a. 
For the algorithm we select every k-th of these projected lines (Fig.2,b). The 
PDE-solution restricted to the selected lines is exactly the DAE-solution on 
corresponding time intervals (Fig.2,c). These parts of the DAE-solution are 
computed by a multiple shooting method. This, together with the PDF’s pe- 
riodicity conditions, yields the non-linear system of equations to be solved 
(Fig.2,b). 

The algorithm was tested on the ring modulator with frequencies ui 
20007T and 0 J 2 = 2000007T. Figure 3,a shows the result of the algorithm after 
one Newton-step. As initial values we chose a 10% perturbation of the steady 
state solution in 40 points. Again, the calculated oscillations are marked. 
The oscillations in between are obtained by linear interpolation and therefore 
the oscillation peaks are piecewise linear which could be improved without 
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Fig. 2 . a) Projection of the characteristic 7 ; b) Characteristics and periodic condi- 
tions; c) Solution: Marked oscillations correspond to the characteristics which are 
used for calculation 




Fig. 3. a) Solution by method of characteristics; b) Solution as a multivariate func- 
tion 



significant increase of calculations e.g. by quadratic interpolation. Figure 3,b 
illustrates the computed result as the corresponding multivariate function. 

The method described above can be expanded to the case of unknown 
frequencies. This problem occurs for autonomous circuits. The frequencies 
are now treated as additional unknows. Because the characteristics depend 
on the frequencies, it’s necessary to adjust them at certain states and the 
algorithm must be embedded in an outer iteration loop. As the frequencies 
add to the number of unknowns, two more equations are needed. The value 
of one component of the solution can be fixed at a certain time and/or the 
derivative of some components can be set to zero. The advantage of the first 
choice is, that the zero-solution is excluded (see [11]). However it cannot 
be guaranteed that the fixed value can be reached by the solution. Another 
possibility is to choose a calibration condition of the form 






, 27T 



J, j ar(i)cos(— : 



0 



( 11 ) 
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for each parameter. After each Newton-step it has to be checked that the 
current frequencies are still admissible. Otherwise a next step of the outer 
loop must be started. 

Analogously an algorithm for the transient analysis of oscillators can be 
developed. Here the principal idea is presented and difficulties are discussed. 

The start-up behavior of oscillators is described by the DAE (1) together 
with the initial condition x(0) = xq. It is assumed that the solution is a 
fast oscillation which is modulated by a slowly varying function. We use the 
ansatz 

x{t) = , (12) 

where m — 2^ z = ( 0 , 2 ( 2 ) and known uj 2 ^ The embedding system is 

d d 

— g(£) + U 2 ^q{x) + f{x) + b = 0 on[0,27r) x M+ . (13) 

OT 1 OT 2 

The solution is assumed to be 27r-periodic in T 2 . This yields the boundary 
condition x(ti , 0) = x{ji , 27t). We have the initial condition x(0, 0) = xo(0) — 
xq from the DAE. But in addition we need initial values on J(0,T2) = Jo (^ 2 ) 
on [0, 27 t) to solve the PDE numerically. 

With numerical algorithms proceeding in ti - direction approximate values 
on the k-th characteristic can be obtained. (Multiple) shooting methods can 
correct these extrapolated values. This is then continued in ti - direction. This 
approach is illustrated in Fig. 4. 





Fig. 4. a) Characteristics and periodicity in r 2 -direction; b) Solution: Marked os- 
cillations correspond to the characteristics which are used for calculation 



This method has still some problems. First of all, different choices of 
initial values on the r 2 -axis lead to the same solution of the DAE. It is 
important for numerical stability and efficiency to choose initial conditions 
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such that the solution is smooth. Extrapolation methods to proceed in ri- 
direction are not efScient. BDF methods for the PDE are highly damping 
which is unsuitable for solutions with increasing oscillations (see [3]). When 
correcting with the multiple shooting method, different solutions are possible. 
Again further calibration conditions have to be used. 
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Abstract. In this work we deal with the numerical simulation of thermal oxida- 
tion in silicon device technology. This application is a complex coupled phenomen, 
involving the solution of a diffusion-reaction problem and of a fluid-structure in- 
teraction problem. Suitable iterative procedures are devised for handling nonlin- 
earities and strong coupling between the sub-problems to be solved. In particular, 
we propose a unified dual-mixed hybrid formulation that allows for the simulta- 
neous solution of the compressible/incompressible Navier equations in both solid 
and fluid domains. The accuracy and the flexibility of the proposed approach are 
demonstrated on benchmark test problems. 



1 Introduction and motivation 

Thermal oxidation of silicon is one of the several steps involved in the manu- 
facturing of integrated circuits (IC). The silicon dioxide is the product of the 
following chemical reaction 



Si + O 2 — y Si02 . 

Silicon dioxide is thermally grown on the silicon wafer bulk to: 

— electrically insulate basic devices like transistors and capacitors built on 
a single wafer 

- act as gate oxide in Metal Oxide Semiconductor (MOS) structures or 
serve as a mask against dopant implantation. 

Numerical simulation of the thermal oxidation process is aimed at predict- 
ing the oxide shape after oxidation in order to better assess the electrical 
performance of the device. Moreover, it is of relevant interest to analyze 
the stress history of the material in order to study its effect on the evolu- 
tion of the oxidation process and to prevent mechanical failures. Realistic 
simulations of the process are achieved by taking into account different phe- 
nomena arising in a strongly heterogeneous assembling of materials. Fig.l 
shows schematically the reduction from a 3D model to a 2D model of the 
local oxidation structure (LOCOS). The most widely adopted mathematical 
model of the oxidation process consists in solving two PDE systems, the first 
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Fig. 1. Schematics of the thermal oxidation process in a local oxidation structure 
(LOCOS): 3D model (left), 2D reduction (center) at the beginning of the oxidation 
and 2D model (left) after process completion. 



being a diffusion-reaction problem for the oxidant and the second a stress 
analysis in the oxide, nitride and silicon bulk. The two PDE systems are 
mutually dependent: the diffusion and kinetic reaction coefficients as well as 
the geometry of the deformed domain depend on the stress distribution; in 
turn, the chemical reaction forces the oxide-silicon interface to move, driving 
the mechanical problem. This first level of coupling is handled by using in 
the diffusion-reaction problem at the new time the coefficients and the 
geometry computed from the stress field at time An incremental stress 
analysis is then performed on the structure subjected to the displacements 
due to the computed rate of silicon consumption and dioxide expansion (see 
Fig. 2 ). 




Fig. 2. Diagram flux for the full coupled problem. 



This incremental stress analysis phase introduces a second nested level 
of coupling, since it requires solving a set of coupled mechanical problems, 
each one in a different material: the SisN4 mask and the Si bulk are indeed 
modeled as linear elastic materials, while a non-Newtonian incompressible 
fluid model with non-linear stress-dependent viscosity is used for the Si02 [ 3 ]. 
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Different strategies may be pursued to handle this second coupled problem. 
These will be object of discussion in Sect.3.1. The data exchanged between 
the sub-blocks in both the coupled systems are fluxes and stress dependent 
quantities, so that the quality of their approximation clearly affects the ac- 
curacy of the overall computation. In a standard displacement finite element 
approach fluxes and stresses are typically post-processed quantities that suf- 
fer from a number of limitations. Examples of these limitations are the failure 
of the post-processed stresses at satisfying self-equilibrium and interelement 
traction reciprocity, the lack of continuity of the fluxes at the interelement 
interfaces and the possible onset of locking problems in the incompressible 
regime. The aim of this work is to investigate and demonstrate the use of 
alternative finite element formulations specifically tailored to overcome the 
above mentioned shortcomings. These formulations, known under the com- 
prehensive name of mixed and hybrid finite element techniques and origi- 
nally developed in the framework of structural analysis, approximate with 
the same accuracy and physical adherence both the primal fields {e.g. the 
displacements) and the dual fields {e.g. stresses and fluxes). A convenient 
implementation of the proposed methods, based on hybridization and static 
condensation, yields efficient numerical algorithms with computational effort 
comparable to standard displacement formulations. 

The paper is organized as follows: in Sect. 2 we discuss the finite element 
discretization of the diffusion-reaction problem and of the fluid-mechanical 
problem. Sect. 3 deals with the decoupled algorithm used to iteratively solve 
the thermal oxidation problem, while Sect. 4 demonstrates the performance 
of the numerical method on several benchmark test-cases. Finally, some con- 
cluding remarks are drawn in Sect. 5. 

2 Finite Element Discretization 

In this section we discuss the finite element discretization of each subproblem 
in thermal oxidation. 



2.1 Notation 

In the following, we shall denote by i? a bounded open set in with Lip- 
schitz continuous boundary F — Fd ^ Fjsf U Fji, where Fd^Fn ^Fr are the 
Dirichlet, Neumann and Robin portions of T, respectively, with Fr = % m 
the fluid-mechanical problem. Let 7^ be a regular partition Th [2] of i? into 
triangles K such that 

7?= (J X. 

K&Th 

For each element K denote by dK the Lipschitz continuous bound- 

ary of K, by dKint the portion of dK such that dK n T = 0 and by uk the 
unit outward normal vector along the boundary dK. Moreover, if v is any 
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function defined in J7, we denote by its restriction to the element K and 
by vqk its restriction on the element boundary dK. 



2.2 DifFusion-Reaction Problem 

The diffusion-reaction problem is solved using the following primal-hybrid 
finite element nonconforming formulation: 

find Ch G such that \fvh G we have 

^ ( [ DVCh ■'^Vhdx -i- [ D~^ksChVhds] = [ fvhdx, (1) 

KeTh JokoFr / Jk 

where is the set of affine functions that are: 

- mid-point continuous on each edge of the triangulation 

- equal to the average of g at the mid-point of each edge of Td, for any 
function g G L‘^{rD)- 

In this formulation the primal variable Ch (approximate oxidant concentra- 
tion) is sought to be a-priori discontinuous and the normal flux of the variable 
itself arises as a Lagrangian multiplier to enforce interelement continuity. The 
relaxation of interelement continuity for Ch has the advantage of providing 
an approximation p/j of the fluxp — —DVC that satisfies element-by-element 
the self-equilibrium condition and that has continuous normal components 
across interelement edges between neighboring triangles. As a consequence, 
the normal component of the velocity of the interface is directly computed 
from the normal flux of the concentration as Vn = —ph - n/Ch, where n is 
the outward unit normal vector on the oxide-silicon interface (see [4] for the 
physical-mathematical derivation of the above relation). Notice that no post- 
processing on the computed approximate concentration field Ch is needed, 
unlike in standard displacement-based finite element methods. Moreover, op- 
timal second order convergence in L^(i?) can be proved for Ch (see [6]). 



2.3 Fluid-mechanical problem 

For the solution of the fluid-mechanical problem, we adopt the novel dual 
mixed-hybrid finite element formulation introduced and analyzed in [1]. This 
method provides an accurate stress representation and at the same time han- 
dles under a unified formulation both the compressible and incompressible 
regimes by introducing a pressure function. This avoids resorting to the quasi- 
incompressible approximation, which is a common approach to deal with in- 
compressible materials in this application field, or using separate computer 
codes, with a significant saving of software maintenance. The discrete formu- 
lation of the problem reads: 
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find {oh,Uh,\h,Ph,^h) € {Sh,gn ^QhX Wh) such that 



f i / <Jh-Thdx,+ '^{ Uh ■ divThdx - \h ■ {th n) ds) 

\ K^h '^dKir.r 



J Q ^ Jo KeTh 



Q 

PI, 

Q ^ J Q 

/ Vh • div Gh dx 
Ken ^ 



Qd • {rh n) ds, Wh G ^h,o, 



■ f -Vh yvh e Uh, 
Jo 



f Gh -\- Ph)qh dx = 0 ^qh^Qh, 

Jo ^ 

/ 'dh^SGhdx = 0 ydh^Wh, 

Jo 

I- E / 

I Ken 



( 2 ) 



Ph * (cThn) ds=Y2 9N • Ph yph € Ah,o, 

Ken 



where A and p being the Lame coefficients of the material. 

p{\ + p) 

Notice that for A = +oo system (2) becomes the discrete approximation of the 
Stokes problem for incompressible fluids. As for the flnite element spaces, for 
A: > 0, we denote by %k{K) the space of polynomials in two variables of total 
degree at most k on the element K and by Rk{dK) the space of polynomials 
of total degree at most k on each edge of K, Notice that functions belonging 
to Rk{dK) need not be continuous at the vertices of dK. Furthermore, we 
denote by ETo(AT) the lowest order Raviart-Thomas flnite element space [5] 
on K and by Bk — curl(6i<:), where hx is the cubic bubble function on K. 
The flnite element spaces in (2) are defined as follows: 



= {r^ G (RTo(X) 0 Bk)\ r^n = on Rn}, Vh = {v^ G 
Wh = {deC°{72)\d^ Qh = {q’^ e%{K)), ( 3 ) 

Ah,r, = {X^ e {Ro{dK))\ A = Pt? on Fd}, VhT 6 T^, 



where V is the projection over the space of piecewise constant functions 
and ^,T] are given functions in (Z/^(rjv))^ and (L^(/£)))^, respectively. Notice 
that two kinds of Lagrangian multipliers have been introduced in formula- 
tion (2). The variable ujh is a rotational parameter that avoids requesting the 
stress tensor to be sought a priori in a symmetric function space. The hybrid 
variable Xh is instead the Lagrangian multiplier that enforces back the con- 
tinuity of the normal component of the stress tensor across the interelement 
interfaces. The abstract analysis of the above formulation has been carried 
out in [1], where in particular a superconvergence result has been shown for 
Xh, as typical of mixed methods with hybridization. 
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3 Iterative algorithm for the coupled problem 

The numerical solution of the full coupled problem is achieved by a sequence 
of successive steps. The computer code implements the following staggered 
algorithm: 



For n > 0, given at the time level the solution, the geometry and the diffusion and 
reaction coefficients, at the new time level we have to: 

1. solve the diffusion-reaction problem and compute the propagation velocity of the 
Si02 -Si interface; 

2. perform a mechanical stress-analysis in Si02 , Si 3 N 4 and Si domains; 

3. determine the new geometry of the Si and Si02 and the maximum allowed time 
step At according to the present (deformed) configuration; 

4. set t^'^^ = t^~^^ -f- At. Update the stress-dependent D and ks 

coefficients; 

5. update the nodal grid point configuration by relaxing the deformed mesh; 

6. if goto 1, else end simulation. 



This decoupled procedure has the advantage of splitting the solution of the 
full problem into several self-contained subproblems of smaller size. Particular 
attention must be paid to the solution of the stress problem at point 2. of 
the algorithm, topic that is addressed in the next section. 

3.1 Fluid- mechanical coupled sub-problem 

The stress analysis problem is a fluid-structure interaction problem. In its 
basic implementation, the Stokes problem in the Si02 domain is solved first 
and the computed normal stresses are used to load the elastic problem in the 
Si and SisN 4 domains. Eventually, no response from these latter materials is 
fed back to the Si02 domain. This strategy, usually referred to as Boundary 
Loading Method (BLM), is economical and widely adopted in the literature. 
However, real-life elastic materials do possess memory and as a consequence 
they tend to “squeeze out” the fluid while relaxing to their initial configura- 
tion. Accounting for this behavior by simply loading back the Si02 domain 
with the deformations computed from the Si and Si 3 N 4 domains invariably 
leads to severe instabilities in the overall numerical procedure, as can be ex- 
plained by the following argument. Assume that the fluid-mechanical system 
is modeled by a spring (elastic behavior) placed in series to a damper (fluid 
behavior). An elementary analysis reveals that solving iteratively the motion 
of the spring-damper system by a decoupled procedure leads to a conflicting 
request on the time step At: indeed, in order to ensure stability of the pro- 
cedure, At should be small enough when the displacement of the spring is 
computed from the velocity imposed by the damper, while At should be large 
enough when the velocity of the damper is computed from the displacement 
imposed by the spring. Based on the above considerations, we have adopted 
the following approach: 
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1 . The interaction between the Si domain and the Si02 -Si3N4 domains is 
handled by the standard BLM, since it can be checked that the deforma- 
tions produced by the Si domain on the Si02 are negligible. 

2 . The interaction between the Si 02 and the SisN4 domains is handled by a 
coupled procedure, with an inner iterative map to solve for the nonlinear 
dependence of the oxide viscosity on the normal stresses. This coupling 
procedure is implemented exploiting the unified compressible/incompress- 
ible formulation discussed in Sect. 2 . 

4 Numerical results 

As a first test case, we show the results for a simple fluid-elastic structure 
solved with the unified coupled procedure illustrated in the previous sec- 
tion. The domain i? is the unit square, with the upper half behaving like 
a fluid, while the lower half behaving like an elastic solid (with a very low 
Young modulus). For a certain time interval a compressive load is applied 
on the top edge of the fluid domain. Then, the load is released and the elas- 
tic solid relaxes recovering its original shape and squeezing out the fluid, as 
shown in Fig . 3 where some phases of the evolution of the phenomenon are 
displayed. The second benchmark problem that we have considered is the 




Fig. 3. Evolution of the coupled fluid-elastic solid system. 



complete simulation of the thermal oxidation process in a LOCOS structure. 
The computational domain is one half of the domain shown in Fig.l. The 
geometry and the material properties have been chosen as in [ 4 ]. In Fig .4 
the deformed configuration and the corresponding pressure field are shown 
at different time levels. The typical ’’bird’s beak” shape of the final oxide 
configuration is clearly recognizable. Notice also how the largest stress arise 
on the junction line between the Si02 and the Si3N4 regions and in particular 
near the lateral edge of the Si3N4 band. 

5 Conclusions 

Thermal oxidation in silicon device technology is a complex coupled phe- 
nomen, involving the solution of a diffusion-reaction problem and of a fluid- 
structure interaction problem. Special attention has been devoted in the 
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t=300s t=600s t=1500s 




Fig. 4. Deformed configuration (top, zoom of the area) and pressure field (bottom) 
for t = 300, 600, 1500s. 



present work to devising suitable procedures for handling the two single 
problems as well as their coupling. In particular, we have proposed a uni- 
fied dual-mixed hybrid formulation that allows for the simultaneous solution 
of the compressible/incompressible Navier equations in both solid and fluid 
domains. Numerical results on benchmark test problems show the accuracy 
and the flexibility of the approach. 
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Abstract. Nonlinear transient eddy current simulations require the solution of 
nonlinear differential-algebraic systems of equations of index 1, for which linear- 
implicit time marching methods of Rosenbrock-type are proposed. These meth- 
ods avoid the iterative solution of nonlinear systems within each time step due to 
their built-in Newton procedures. Embedded lower order schemes allow an error- 
controlled adaptive time step selection to take into account the nonlinear dynamics 
of the underlying process. Extrapolation methods used for start value generation 
include a new subspace projection method to improve the numerical performance 
of the simulations. 



1 Introduction 

Using the Finite Integration Technique (FIT) [16], a discretization method 
which reformulates the Maxwell’s equations in their integral form on a dual 
grid pair {G, (?} into a set of matrix equations, the simulation of transient 
magnetic fields can be performed with the solution of a nonlinear differential- 
algebraic system of equations of index 1 (DAE-1) 

MK^a(f) + CM^[a(i)]Ca(i) = Sq := a(io), ( 1 ) 

at 

where a denotes the component vector for the line integrals of the modified 
magnetic vector potential along the edges of the grid G, the matrices C, C are 
discrete curl operators containing the incidence relations of the grid edges, 
the matrix is the commonly singular matrix of electric conductivities, 

the matrix is the matrix of material reluctivities which are ^lution de- 

pendent in case of ferromagnetic saturation effects, the vector j ^ contains 
the time dependent excitation source currents. The eddy currents and the 
magnetic fluxes are given with b = Ca, respectively [4]. 

A computationally slightly more efficient reformulation of (1) introducing the 
reduced vector potential is given in [5]. 
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Since the material coefficients in the reluctivity matrix commonly have 
measurement errors in the range of several percent, low accuracy require- 
ments can be applied for the solution of (1). In [3], [15] and [7] the usage 
of a third order singly diagonal implicit Runge-Kutta (SDiRK3(2)) method 
with an embedded second order method was proposed for an error-controlled 
variable step length time integration of system (1). In this method, however, 
an additional error control scheme is required for the solution of the non- 
linear algebraic systems of equations of each internal stage. In this paper 
we propose to alternatively use linear-implicit Rosenbrock methods [12], [13] 
for the solution of system ( 1 ), which allow to avoid the application of such 
additional linearization schemes. 

2 Linear-Implicit Rosenbrock Time Integration 
Methods 

Rosenbrock methods belong to the family of one-step linear-implicit time 
integration methods that differ from SDiRK-methods in such a way that 
they already incorporate the Jacobi-matrix + CMi^^[a’^]C of the 

Newton-Raphson-method in the 5 -stage scheme. The Jacobi matrix contains 
the material matrix Mj^^[a^] of the differential reluctivities Ud = i.e., 

the inverse of the local derivative in the B-H-material curve evaluated for 
the magnetic fluxes b = Ca^ at time The application of a Rosenbrock 
method to ( 1 ) yields the linear algebraic systems of equations 

= -CM„[vi]Cvi + J ) 

( 2 ) 

which have to be solved for z = 1 , . . . , 5 with f — aiAt, v* = a^ -h 
Sj=i ^ij^^nj and the new time solution results from a^+^ = rrii'Vni- 

The coefficients aij^ c^j, ai , 7 ^, 7 and rrii determine the specific Rosenbrock 
method [13]. 

An embedded lower order solution is delivered by a second coefficient set 
fhi , which provides an error criterion by comparison to the higher order solu- 
tion. For the error-controlled time integration of (1) only one accuracy thresh- 
old has to be specified for the nonlinear transient time marching process. 
Standard implicit DAE -1 time integration schemes such as the methods, 
the backward differentiation schemes or embedded SDiRK methods require 
the separate application of a nonlinear iteration scheme such as the (mod- 
ified) Newton-Raphson method or Quasi-Newton methods. These lineariza- 
tion methods, however, need additional control parameters for the nonlinear 
iteration accuracy and the corresponding adaption of the linear solvers, the 
maximum numbers of iterations, a relaxation parameter optimization. For the 



^At 



M, + CM,,[a^C 
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convergence of the Newton- Raphson scheme in magnetodynamic simulations 
problems were reported in [9] in some practical applications. 

In practical simulations with linear-implicit methods the evaluation of 
the excitation current vector time derivative, which is required in the ap- 
plied linear-implicit schemes, can be a problem since the current functions 
may be non-differentiable e.g. in case of ramped piecewise linear current ex- 
citations or if the shape is provided as a list of discrete measurements. In 
practical calculations [7], however, it was found to be sufficient to provide an 
approximation of the one-sided derivative j j (^^) — j — St)). 

Linear-implicit schemes exist for which the evaluation of the time derivative 
vector of the righthand side is not required at the cost of additional internal 
stages [12]. 

3 Adaptive Time-Stepping 

Adaptive time stepping can be implemented elegantly with embedded Im- 
plicit Runge-Kutta (IRK) methods, which deliver for each time step both a 
solution of a given order and an embedded solution of lower order. Their dif- 
ference y — — R^P^ with the orders p and p of the Runge-Kutta method 

can be used to produce an error criterion [12] to yield an estimation for the 
truncation error of the time integration. 

3.1 Error Norms and Error Tolerances 

The norm of the error vector y can be used to predict the required time 
step length to achieve an a priori user defined accuracy. A suitable error 
vector norm is selected in [7] to be ||y||err max^ \ y-/{\ai\-\-ai)\, where 5* 
is an absolute tolerance for the component a^, which is typically chosen to 
correspond to one global absolute error parameter a [7]. The introduction 
of this parameter a is necessary to avoid numerical underflow as well as an 
unnecessarily fine time discretization, if the magnetic flux component vectors 
at that time are small in value compared to their maxima during the whole 
transient process [4]. 

The influence of the absolute tolerance parameter on the adaptive scheme 
is large [1, p.l31], [6], because the systems resulting from (1) are commonly 
not scaled to receive algebraic solutions with a 2-norm in the interval [0, 1] 
- a typical silent assumption in the mathematical literature. The param- 
eter a can either be chosen by an a priori magnetostatic field simulation 
for the maximum current 3 during the time process or in a more 

conservative, adaptive way by monitoring the maximum norm a := d - 
t ]{||S(0lloo} with 'd G [10~^,10~^] of the previous solution vectors 

[7]. 

Though closely resembling each other, numerical experience shows that 
the specific choice of the error norms and the absolute tolerance used in the 
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problems will severely effect the efficiency of simulations (see e.g. [4], [7]). 
Coupling the relative time step error tolerance rtol and the entries of the 
Butcher scheme of the SDiRK method with ^o/nr == 0.5\\{b^ — b^)A~^\\~^rtol 
according to [14], the nonlinear Newton-Raphson iterations are terminated 
once ||(5ai_l_i|| < ^o/NRl|Si|| holds. The preconditioned conjugate gradient 
(PCG) iterations for the linearized systems are stopped by ||rn|| < ^o/pcgIIPoII, 
involving the residual vectors f Here the tolerance tolpcG is adaptively cou- 
pled to ^o/nr following [10]. In linear-implicit Rosenbrock methods only the 
iterations for the linear algebraic systems are to be terminated accordingly. 

3.2 Time Step Selection 

Based on the error norm ||y||ej.j. as an estimation of the local truncation error, 
the prediction of a new time step, either for the next simulation step, or for 
a repetition of the last step in case of insufficient accuracy, is performed 
by a step size controller. Hereby a solution is rejected, if a situation with 
rtol < ^11 y Herr occurs, where v is an acceleration factor typically set as 1.2. 
A controller scheme involving a maximum and minimum error and a linear 
step size variation has been proposed in [15]. 

More advanced schemes can be received using control-theory models [11]. 
Understanding the simulation as a control loop leads to a re-interpretation of 
a standard time step predictor originating from mathematical literature (e.g. 
[12]) as I-controller scheme [11]. Numerical tests in [6] with more sophisti- 
cated PI- or hybrid Pl/I-controller schemes [11], however, show no significant 
advantage over the I-controller scheme for common simulation situations in- 
volving (1). 

4 Extrapolation Techniques 

In [8] simple extrapolation techniques, i.e., schemes to provide starting values 
for the iterative linear system solver, mainly based on Taylor series expan- 
sion as well as more sophisticated schemes for the SDiRK scheme [2] have 
been compared. To attain a problem independent robustness of the different 
strategies, hybrid schemes were proposed where the evaluation of the mini- 
mal residual norm decides about the choice of the specific start vector for the 
iterative solution process at the new time step. 

A new and even more refined hybrid scheme uses a subspace projection 
strategy; Given two extrapolated start vectors and § 0 ^ 2 ^^^ resulting 

from the non- hybrid extrapolation techniques proposed in [8]. An orthonor- 
malization of these vectors yields the vectors vi and V 2 which set up the 
matrix V = {vi, V 2 } onto which the algebraic linear system = 

Qf time stepping process for time (as e.g. (2)) is projected 

yT^(n+l) Y 2 ^ YT^{n+l) ^ ^ 3 ^ 




166 Markus Clemens et al. 



Exact solution of the 2 x 2-system (3) for the coefficient vector z allows 
to define the extrapolated vector Vz. In case of singularity of the 

system (3) the projection can be reduced to a scalar equation \ J viz = 
<^Tp(n+i) yield := ZY\. For the positive definite linear systems this 

procedure provides linear combinations of two (or eventually more) different 
extrapolated solution vectors which are optimal with respect to the spanned 
subspace Ve span{vi, V 2 } and the linear system to be solved, since 

z = arg min - z^V^f (4) 

holds. This optimality result can be extended to the consistently singular 
algebraic linear systems resulting from the non-gauged magnetodynamic for- 
mulation (1). 

5 Numerical Results 

Here, we mainly investigate the behavior of a linear-implicit Rosenbrock- 
Wanner method RODAS3(2). This 4-stage method is of third order and has 
an embedded scheme of second order [13]. Both the time integrator of third 
order and the embedded scheme are stiffly accurate and L-stable and thus 
applicable to the magnetodynamic DAEs of index 1. 

Comparison of this scheme will be performed with a singly diagonal im- 
plicit Runge-Kutta scheme (SDiRK3(2)) presented in [3], which was success- 
fully adapted to time step adaptive magneto-quasistatic simulations with (1) 
in [7] and [15]. Similar to the RODAS3(2) scheme, this 4-stage method is also 
of order 3 with an embedded solution of second order and also stiffly accurate 
and L-stable for both orders. Whereas in [15] a Successive Approximation 
technique is used for the evaluation of the 4 nonlinear intermediate stage 
solutions per time step in the SDiRK3(2), here we use a Newton- Raphson 
scheme augmented with a scalar under-relaxation process to ensure stability. 

Both the linear-implicit method and the standard SDiRK scheme are 
applied to a test configuration consisting of an eddy current iron plate with 
a hole (11.750 degrees of freedom) with different current excitation forms 
using the B-H-curve of the TEAM 25 benchmark problem. The results in 
Fig. 1 show that both the linear-implicit RODAS3(2) method and SDiRK3(2) 
scheme robustly follow the prescribed relative error tolerances rtol = 10“^ — 
10“^, where minor transgressions are due to safety factors in the norms [7]. 
For low accuracy requirements {rtol > 10“^) the RODAS3(2) scheme exhibits 
smaller simulation times on a Pentium III 933 MHz PC than the SDiRK3(2) 
scheme, which appears to be faster for the smallest accuracy rtol = 10“^. 
Note, that even the ramped piece- wise linear current excitation function in 
Fig. 1(b) could be integrated without problems with both schemes. 

The results of a time step adaptive transient simulation of the TEAM 
21b problem, a 50 Hz current driven nonlinear magnetic field benchmark 
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Fig. 1. Comparison of different rel. error thresholds rtol for RODAS3(2) and 
SDIRK3(2) method with a nonlinear eddy current test example (Plate with hole; 
simulation times on 933 MHz PC): (a) Sinusoidal current excitation function, (b) 
Ramped excitation curve. 



problem, are depicted in Fig. 2 for both 4-stage schemes. For a FIT model 
with 475.000 degrees of freedom [9] the number of required matrix vector 
multiplications (MxV) and simulation times on a Pentium III 933 MHz PC 
are given in Table 1 for different relative accuracy values rtol = 10“^ — 10“^. 

The results in Table 1 indicate, that for lower accuracy again the linear- 
implicit RODAS3(2) appears to be more efficient than the SDiRK3(2) scheme, 
which benefits for higher accuracy from the local quadratic convergence of 
its Newton-Raphson process used for the interior stages. 

Finally, the results of numerical experiments with extrapolated start val- 
ues are shown in Table 2 for the plate with a hole problem (linear material) 
and a sinusoidal ramped excitation (see Fig. 1), the TEAM 11 hollow con- 
ductive sphere in an abruptly started magnetic field and again the nonlinear 
time-harmonic TEAM 21b benchmark problem. Here the total number of 
required matrix- vector-multiplications (MxV) corresponding to the compu- 




168 Markus Clemens et al. 



Table 1. TEAM 21b Problem: Comparison of the linear-implicit RODAS3(2) and 
the SDIRK3(2) scheme. (Number of rejected steps in brackets.) 





RODAS 


SDiRK 


RODAS 


SDiRK 


RODAS 


SDiRK 


RODAS 


SDiRK 


rtol = 


1.0* 


10"^ 


5.0- 


10“^ 


2.5- 


10-2 


1.0- 


10“^ 


MxV 


31910 


56869 


39658 


65727 


87277 


90196 


427599 


205270 


Cpu-time 


3581s 


5769 s 


3992 s 


5961s 


8434 s 


8172 s 


41240 s 


20620 s 


Steps 


12(3) 


11(6) 


16(5) 


15(7) 


30(20) 


22 (12) 


256(25) 


97(16) 


NR-cycles 


- 


141 


- 


185 


- 


289 


- 


894 



z 






Fig. 2. Calculated and measured results of magnetic flux densities for TEAM 21b 
problem for rtol = 10“^, for which both RODAS3(2) and SDiRK3(2) yield identical 
results. Included are also the relative error distributions along measurement path 
1 for the simulations with rtol = 10“^ and rtol = 10~^. 



tational costs are compared for several extrapolation strategies described in 
[8] and in (3). Time integration schemes used here are conventional methods 
as a fixed time stepping Euler backward differentiation (BDFl) scheme and 
the time step adaptive SDiRK3(2) method compared to the linear-implicit 
RODAS3(2) scheme. Using the sophisticated extrapolation schemes leads to 
a clear advantage in terms of computational costs. 

6 Conclusion 

Linear-implicit Rosenbrock-type time integration methods were proposed for 
nonlinear transient magnetic field simulations using the Finite Integration 
Technique as spatial discretization method. These methods, featuring a built- 
in Newton procedure, allow for an error-controlled adaptive time stepping 
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Table 2. Comparison of extrapolation strategies using fixed and adaptive time 
stepping schemes: Total number of matrix vector multiplications. 



Meth. Extrapolator 


EC plate TEAM 11 TEAM 21b 


BDFl, fixed time steps: tolpco = 


1.0-10“® 1.0-10“® 1.0-10“® 



1) 0 


3951 


16085 


136736 


2) 


3424 


6663 


133657 


3) a^"+*^ = a<"> +Zli^a("> 


2554 


16730 


129022 


4) 2) or 3) by min 


2714 


6703 


129167 


5) 2) and 3) by eqn. (3) 


923 


2889 


118516 



SDiRK3(2), adaptive: rtol = 10"^ 2.5 > 10~^ 2.5 • 10"^ 



6)=1) 


4824 


17155 


99552 


7)=3) 


4473 


10925 


69946 


8) Continuous extension [2] 


3877 


7466 


58799 


9) Stage extrapolation [2] 


4145 


10787 


66660 


10) 8) or 9) by min ||Ma^"+^> - 


3777 


6688 


61478 



RODAS3(2), adaptive: rtol = 


10“® 2.5 - 10“^ 2.5 - 10“^ 


ii)«i) 


5123 8135 67883 



of nonlinear magnetodynamic problems. A comparison to standard SDiRK 
schemes of the same order shows advantages of the Rosenbrock-schemes in 
terms of numerical efficiency for the low accuracy required in magnetody- 
namic field simulations. For the generation of start values for the iterative 
solution of the algebraic systems an optimal subspace projection extrapola- 
tion strategy was introduced and tested with good results. 
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Abstract. The finite element model of a superconductive dipole magnet is equipped 
with a specialised conductor model which accounts for the inter-strand currents 
caused by the ramping of the magnet without explicitly meshing the individual 
strands. 



1 Introduction 

Superconductive magnets are planned for the new accelerator facilities in the 
“Gesellschaft fiir Schwerionenforschung” (GSI) in Darmstadt, Germany. The 
ramping of the magnets with 4 T/s will cause eddy currents in the coils of 
the magnet which diminish the quality of the aperture field and generate 
additional losses. Three different mechanisms are distinguished: persistent 
currents in the superconductive filaments, the coupling between filaments 
and the coupling between the strands of the Rutherford cable [6]. Persistent 
currents are simulated in [5]. This paper focuses on the simulation of inter- 
strand eddy currents within finite element (FE) magnet models. 



2 Finite element magnet model 



Eddy current phenomena are described by the partial differential equation 
(PDE) 

^A. 

Vx(i/Vx A)+a— =Js (1) 

in terms of the magnetic vector potential A, the source current density Jg, the 
reluctivity v and the conductivity a. Here, the magnetic field is computed in 
a 2D cross-section i?fe of the superconductive magnet (Fig. 1). The Cartesian 
coordinate system parallel to i?fe is denoted by {x,y). For convenience, some 
formulae are expressed in terms of the equivalent polar coordinate system 
(r, 1 ?). The currents are perpendicular to the cross-section. Because of the 
longitudinal magnet geometry, the magnetic flux can be assumed to lie in the 
considered plane. The 2D formulation is 



^ / dA, \ 

dx \ dx ) 



l_ f 

dy V dy J 



+ (J- 



dt 



= Js. 



( 2 ) 
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Fig. 1. (a) Magnetic flux lines for the cos 0-type dipole superconductive magnet 
and (b) magnetisation due to cross-over eddy currents in the coils. 



where Az and Js^z are the z-components of A and Jg respectively. 

The discretisation of (2) by e.g. linear triangular FE shape functions 
Ni{x,y) yields the system of equations 

Ku + g{u) = f (3) 



where u contains the FE degrees of freedom for , 



K, 



-L 

fi= [ 

Jn 



dNi dNi 
V 

V dx dx 
Js,zNi di? . 



+ 1/- 



dNj dNj 
dy dy 



dO , 



i?fe 



(4) 

(5) 



Because the strands of the Rutherford cable are not considered in full detail 
in the FE model, discretising the eddy current term in (2) is not straight- 
forward. Different eddy current mechanisms give rise to several eddy current 
contributions of the form g{u) as described below. 



3 Rutherford cable 

The cos 0-type dipole magnet (Fig. 1) has coils featuring windings of Ruther- 
ford- type cable (Fig. 2). The cable originally has a rectangular cross-section, 
it consists of two twisted layers of strands and is keystoned in order to pro- 
vide a better fit to the magnet geometry. Each strand consists of a copper 
wire with several embedded superconductive NbTi-filaments. All strands are 
connected in series and in normal operation, they only carry current in the 
filaments. If the filaments are not saturated, the voltage drop along the ca- 
ble is zero in static regime. The copper matrix takes over the current if the 
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twisted strand 




cross-over loop adjacency loop 

Fig. 2. Geometry of a Rutherford cable. 







filaments get saturated. The strands are insulated from each other. To en- 
sure a sufficient robustness against the quenching of the magnet, however, 
a current redistribution between the strands has to be possible up to a cer- 
tain extent. Therefore, the insulation between the strands is required to be 
partially conductive. During the ramping of the magnet, the time- varying 
field induces eddy currents crossing the insulation and forming loops over 
several strands. The eddy currents cause a deterioration of the quality of the 
magnetic field in the magnet aperture. This phenomenon has to be simulated 
accurately in order to predict magnet operation during ramping at an early 
stage in the magnet design. Two different kinds of current paths connecting 
several strands are distinguished: rectangular paths formed by neighbour- 
ing strands carrying so-called adjacency eddy currents and diamond-shaped 
paths formed by strands of different layers which carry so-called cross-over 
eddy currents. The corresponding inter-strand resistances are called adjacency 
and cross-over resistance (Fig. 2). The Rutherford cable itself features a bet- 
ter insulation. Therefore, currents do not migrate between the windings of 
the coil. 

It is not possible to model all geometric details of the Rutherford cable 
within the overall FE magnet model. The dimensions of the single strands 
and especially those of the single filaments are very small compared to the 
dimensions of the overall magnet, which prohibits the use of a detailed FE 
mesh. It is not recommended to consider the individual windings as well. More 
efficient FE models are obtained if the discretisation goes beyond the geo- 
metrical barriers of single filaments, strands and windings. Problem-tailored 
modelling techniques are developed here, dealing with the particular geomet- 
ric properties of the Rutherford cable as applied in cos (/>-type superconductive 
magnets. 
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4 Finite element model for Rutherford cable 



4.1 Keystoning 



The keystoning of the Rutherford cable will cause the density of supercon- 
ductive filaments to be larger at the inner side of the coil compared to the 
outer side. As a consequence, the applied current density depends 

on the spatial coordinate r in the considered cross-section: 






A^turns4pp / r 2 -r r-ri 

{f'2 — 'f'lf' V ^^2 



(6) 



where r\ and V2 are the inner and outer radii of the coil, 2 h\ and 21)2 are 
the thicknesses of the Rutherford cable at ri and V 2 respectively, ATturns is 
the number of strands per winding and /app is the applied current (Fig. 2). 
Keystoning is taken into account in the FE model by accurately integrating 
(5) with the current density defined by (6). 



4.2 Adjacency eddy currents 

Analytical formulae dealing with adjacency eddy current effects are reported 
in e.g. [4,6]. There, a distinction is made between adjacency eddy currents 
Jpa,^ closing in neighbouring strands due to the time-varying magnetic field 
perpendicular to the long side of the Rutherford cable, and eddy cur- 
rents Jia,z closing in neighbouring strands due to the time-varying parallel 
magnetic field ^i. In analytical models, the additional magnetisation due to 
eddy currents is characterised by the time constants Tpa and t\^. The eddy 
currents Jpa,z and Jia,z flow in cross-sections of the Rutherford cable with 
different orientation and shape. Their treatment within the FE cable model 
developed here, is nevertheless the same. Specialised conductor models have 
already been developed for foil windings and multi- conductor windings in [1] 
and [2], respectively. In this paper, a similar modelling technique is developed 
for the particular Rutherford cable layout. Another possible approach is the 
embedding of a detailed cable model as a macro-element into the magnet 
model as is done for machine windings in [3]. 

Consider a current redistribution zone Oq consisting of a single layer of 
all strands either along the short side or along the long side of the cable. 
Due to the finite resistance of the insulation between the individual strands, 
the current can redistribute within Further migration between the ca- 
ble windings is prevented by the cable insulation which has a substantially 
higher resistivity. The redistribution of the current in the direction perpen- 
dicular to the considered layer is not considered since this redistribution is 
treated by a separate current distribution zones with orientation perpendic- 
ular to the considered current redistribution zone Qg. This splitting allows 
to consider anisotropic adjacency resistances as e.g. in the case where the 
Rutherford cable has a resistive core between the two longitudinal layers 
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(Fig. 2). The conductivity (jpa experienced by the adjacency eddy currents 
can be computed analytically and depends on the true adjacency resistance 
of the cable, which is commonly obtained by measurement, the dimensions 
of the strands and the twist pitch of the Rutherford cable. The electric field 
Ez in the z-direction is unknown. The eddy current density in layer Qq due 
to the ramping of the magnetic field is 

(7) 






pa^z <^pa‘ 



dt 



The netto current 



- [ 

JOa 



Jpa.,z{r, 



( 8 ) 



through the current redistribution zone Qq is zero. 

Since the Rutherford cable is comparatively long in the 2 ;-direction and 
since the current can close through the perfectly conductive filaments of the 
twisted strands, the electric field can be assumed to be constant over f2q. For 
each current redistribution zone a constant shape function Mq{x,y) is 
defined to have the value 1 in Qq and 0 in i?fe \ Qq . The electric field can be 
expressed by 



Ez{x,y) = J2 Ez,qMq(^X^ y) . 



(9) 



The adjacency current density vector due to the perpendicular time- varying 
magnetic field, ^pa, is obtained by weighing (7) by the FE shape functions 
Ni{x^y) associated with the magnetic vector potential: 



^pa — -^pa 


dt 


(10) 


where 








/ (Tp^Ni{x,y)Nj{x,y)dQ , 


(11) 


^pa,,i(] ~ j 


/ (Xpa.Ni{x,y)Mq{x,y)dQ 


(12) 






(13) 



and Cpa is the vector with all degrees of freedom Ez^q. The additional con- 
straint forcing the netto current (8) to zero is weighted by the FE shape 
functions Mp{x^y) associated with the electric field unknowns yielding 



T ^ 

“^pa'^ + ^pa^pa 



-0 



(14) 



where 

G 



pa,P9 



= / (jp^Mp{x,y)Mq{x,y)dQ . 



(15) 
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The FE conductor model representing adjacency eddy currents in neighbour- 
ing strands due to the time- varying magnetic field, is inserted in the global 
FE model: 



-Wpa 0l ^ r U 1 r/T “-^pal [ 1 _ [/ 

-Zj^Oj dt [epaj ^ [0 Gpa J [epaj “ [o 



(16) 



which, after discretisation in time and scaling the constraint equations by 
an appropriate factor, results in a symmetric, positive definite system of 
equations. 

The correct modelling of eddy currents closing through adjacent strands 
depends on the choice of the set of current redistribution zones Oq which 
are represented in the coupled FE model by the constant shape functions 
Mq{x,y). Two types of current redistribution zones are selected: zones cor- 
responding to layers of strands along the long side of the Rutherford cable 
and those corresponding to layers of strands along the short side of the ca- 
ble. The set of shape functions Mq{x^y) constitutes a discretisation of the 
unknown electric field. For large coils, this approach may still introduce a too 
large number of additional unknowns. The formulation (16), however, allows 
for a further coarsening of the FE model. The numerical current redistribu- 
tion zones Qq can consist of several physical current redistribution zones. It is 
shown in [2] that the FE model already provides reliable results for numerical 
electric field discretisations which are substantially coarser than the true ge- 
ometry of the cable strands. It is also possible, and sometimes even explicitly 
recommended, to select FE shape functions other than piecewise constants 
for Mq[x^y). A set of wavelets containing the frequencies according to the 
periodical distribution of strands, may yield extremely efficient FE models. 



4.3 Cross-over eddy currents 

Due to the typical twisting of the strands in a Rutherford cable, diamond- 
shaped current loops arise if the current migrates between both layers and 
follows paths with different orientations (Fig. 2). Only current paths with 
two cross-over points are considered. Due to twisting, the cross-over currents 
close without experiencing additional resistance at the short sides of the 
Rutherford cable. Since such a cross-over loop necessarily extends to the 
borders of the Rutherford cable, the cross-over eddy current phenomenon 
can not be considered as a local eddy current effect. 

Cross-over eddy currents are induced by a time-varying, perpendicular 
magnetic field. The magnetic flux through a cross-section at the azimuthal 
coordinate d of the Rutherford cable with an r- 2 :-plane is 

{r2,'d) - (ri , ■!?)) (17) 



where iz is the length of the magnet in the z-direction. The magnetising flux 
induced by the cross-over eddy currents is 




(18) 
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where Tpc is the time constant for cross-over magnetisation which depends on 
the cross-over resistance and the geometry of the Rutherford cable [4]. The 
magnetising flux is represented in the FE model by current sheets at r = ri 
and r = V 2 - The corresponding current densities are 



Jz{r,i9) 



<Ppc ri + T 2 ( (r) 
4 (r 2 -n) 2 V 7n 



^r2{r) 

7^2 



(19) 



where 6r^ (r) denotes the Kronecker delta function located at and 7 rep- 
resents the thickness of the current sheets. Weighing (19) by the FE shape 
functions Ni{x,y) results in an additional term representing the cross-over 
eddy current effect: 

dxi 

9pc — (^0) 

where, with Afki'd) = {Nk (r2,i?) - Nk (ri,i?)), 

Mpcij = r ■ ( 21 ) 

Jo ^ v^2 — j 

The integrations in ( 21 ) are performed over the inner and outer boundaries 
of the coil. The load term (20) only affects FE nodes at these boundaries. 
This feature reflects the non-local character of cross-over eddy currents. The 
coupled FE model accounting for eddy currents in Rutherford cable both in 
adjacent strands and due to cross-over coupling is 



r -^pa + Ol 


d 


U 


+ 


'K -Zpa' 




u 






1 

1 

0 


dt 


^pa 


. 0 Gpa 








0 



This system is discretised in time by Galerkin-type linear time steps. The 
systems of equations are solved by the Conjugate Gradient method with a 
Symmetric Successive Over-Relaxation preconditioner. 



5 Application 

The 2D transient FE model of a quarter of a superconductive dipole magnet 
is equipped with the specialised Rutherford cable model. The magnetic dipole 
field is increased from 0.2 T to 2 T with a ramp rate of 4 T/s. In the simula- 
tions carried out here, it is assumed that the superconductive filaments are 
not saturated and hence perfectly conductive. The ramping of the magnetic 
field causes eddy currents in the Rutherford cable and hence, disturbances to 
the magnetic field in the aperture. The adjacency and cross-over eddy cur- 
rent densities are shown at the time instant when the magnetic dipole field is 
2 T. Due to the high perpendicular magnetic field at = 0, significant eddy 
currents appear there. In contrast to the adjacency eddy current density, the 
cross-over eddy current density reflects a symmetry along the center line of 
the coil (Fig. 3). The magnetisation due to cross-over coupling is shown in 
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Fig. 3. Absolute values of (a) the adjacency eddy current density due to the time- 
varying azimuthal field and (b) the cross-over eddy current density in the Ruther- 
ford cable. 



Fig. lb. Thanks to the neighbouring iron yoke, only a small fraction of the 
cross-over magnetising flux enters the aperture of the magnet. The magneti- 
sation due to eddy currents causes a small deterioration of the field quality of 
the magnetic dipole field in the aperture. The specialised cable model avoids 
the construction of detailed meshes in the winding area and hence yields small 
FE models for which transient simulation following a ramped excitation cycle 
becomes feasible. 

6 Conclusions 

The specialised FE cable model accounts for the adjacency eddy currents in 
neighbouring strands and the cross-over eddy currents between the twisted 
layers of Rutherford cable without explicitly meshing the individual strands of 
the cable. This results in considerably smaller FE models for superconductive 
magnets and enables an accurate, transient analysis of the magnetic field 
during the ramping of the magnets. 
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Abstract. New modeling technology is developed that allows engineers to define 
the frequency range, layout parameters, material properties and desired accuracy for 
automatic generation of simulation models of general passive electrical structures. 
It combines electromagnetic (EM) accuracy of parameterized passive models with 
the simulation speed of analytical models. The adaptive algorithm does not require 
any a priori knowledge of the dynamics of the system to select an appropriate 
sample distribution and an appropriate model complexity. With this technology, 
designers no longer must put up with legacy modeling techniques or invest resources 
in examining new ones. 

1 Introduction 

Component and circuit models are a cornerstone of EDA (Electronic Design 
Automation) technology. With wireless and wireline designs constantly in- 
creasing in complexity and operating at higher frequencies, design engineers 
push the limits of their EDA tool’s passive analytical models. Often, these 
passive models are used outside their operational range, causing the EDA tool 
to return inaccurate simulation results. The inconsistencies of legacy model- 
ing techniques from the 1970s and 1980s hinder the accuracy of these models 
when applied to different processes and frequencies. Exceeding a model’s 
frequency limit causes errors due to the model’s failure to account for higher- 
order propagation modes. Limitations of the equivalent circuit model, such 
as frequency independent inductive or capacitive elements, also lead to sim- 
ulation errors. Since most EDA tools do not proactively report such errors, 
they propagate through the design flow and may not be discovered until a 
prototype fails to perform as expected. To avoid errors and inconsistencies, 
full- wave EM simulation is required to fully characterize the structure and 
produce an accurate S-parameter model of the discontinuity that is then used 
by the circuit simulator. 

Developing new models is not a trivial task! To model a single parameter over 
a range of values, several sample points are required. Since the model can be 
a function of many layout parameters (line width, length, metal thickness, 
dielectric constant, substrate thickness, loss tangent, etc.) there is an expo- 
nential growth in the number of samples as the number of layout parameters 
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Fig. 2. Step Two: Multinomial models are created at discrete frequencies. 



182 



Tom Dhaene 



increases. Also, developing a new model usually requires a highly skilled per- 
son working for an extended period-several weeks or even months-to build, 
test and produce the desired analytical model. If the requirement is for a 
complete library of models, the total effort is multiplied by the number of 
models sought. This task needs to be weighed against measurement-based or 
EM-based modeling on a case-by-case basis. 

Some common approaches to modeling issues have limiting factors [1,2]. 
Methods using pre-calculations of equivalent circuits, using a variety of look- 
up tables, fitting equations and interpolation techniques can have a limited 
number of samples and have insufficient interpolation methods. One clear 
example where the dependability of these techniques comes into question is 
with high-Q resonant circuits such as those used in narrow band filters. Us- 
ing discrete data grids and interpolation techniques with such circuits might 
cause the generated model to suffer from either ” oversampling” or ” undersam- 
pling.” With oversampling, too many data samples are collected and model 
generation is inefficient; on the other hand, with undersampling, too few data 
samples are collected and the model is not completely defined. 

As an alternative to building classic analytical models, engineers can utilize a 
full- wave EM modeling tool to fully characterize a given passive component. 
This method permits accurate characterization of the actual passive struc- 
ture to be used, accounting for higher-order mode propagation, dispersion and 
other parasitic effects. However, the calculation time required for full- wave 
EM simulation of a given component makes real-time circuit tuning impossi- 
ble. A new efficient adaptive sampling and modeling technique addresses this 
model accuracy dilemma. The Multidimensional Adaptive Parameter Sam- 
pling algorithm (MAPS) selects a limited set of data samples in consecutive 
iterations, and interpolates all S-parameter data using rational and multino- 
mial fitting models. This algorithm allows important details to be modeled 
by automatically sampling the response of the structure more densely where 
the S-parameters are changing more rapidly. The goal is minimizing the to- 
tal number of samples needed, while maximizing the information provided 
by each new sample. The new modeling technique combines the speed and 
flexibility of analytical models, and the accuracy and generality of full- wave 
EM simulation in one compact parameterized passive model [3,4]. 



2 Adaptive modeling and sampling technique 

The MAPS technique builds a global fitting model of the chosen parameters, 
handling frequency and geometrical dependencies separately. Multidimen- 
sional polynomial (or multinomial) fitting techniques are used to model the 
geometrical dependencies, while rational fitting techniques [5] are used to 
handle frequency dependencies. The modeling process does not require any a 
priori knowledge of the circuit under study. Different adaptive algorithms are 
combined to efficiently generate a parameterized fitting model that meets the 
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Fig. 3. Step Three: Creation of the coefficients of orthogonal multinomials at dis- 
crete frequencies. 




Fig. 4. Step Four: Calculation of coefficients of orthogonal multinomials over the 
entire frequency range. 
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predefined accuracy. This includes the adaptive selection of an optimal num- 
ber of data samples along the frequency axis and in the geometrical parameter 
space, and adaptive selection of the optimal order of the multinomial-fitting 
model. 

The number of data points is selected to avoid oversampling and undersam- 
pling. The process of selecting data points and building models in an adaptive 
way is called reflective exploration [6]. Refiective exploration is useful when 
the process that provides the data is very costly, which is the case for full- 
wave EM simulators. Reflective exploration requires reflective functions that 
are used to select new data points. For example, the difference between two 
fitting models can be used as a reflective function. Also, some physical rules, 
such as a passivity-check, can be used as a reflective function. The modeling 
process starts with an initial set of data points. New data points are selected 
near the maximum of the reflective function until the desired accuracy is 
reached. 

The model complexity is automatically adapted to avoid overmodeling (over- 
shoot or ringing) and undermodeling, and the model covers the whole param- 
eter and frequency space and can easily be used for optimization purposes. 
The MAPS modeling technique follows four steps to adaptively build a model. 



— Step 1: The frequency response of the circuit is calculated at a number of 
discrete sample points (using the Agilent Momentum full- wave EM simu- 
lator [7]). The Adaptive Frequency Sampling (AES) algorithm [5] selects 
a set of frequencies and builds a rational model for the S-parameters over 
the desired frequency range (Figure 1). 

— Step 2: A multinomial is fitted to the S-parameter data at multiple dis- 
crete frequencies (Figure 2). 

— Step 3: This model is written as a weighted sum of orthonormal multi- 
nomials. The multinomials only depend on the layout parameters. The 
weighting coefficients preceding the orthonormal multinomials in the sum 
are only frequency dependent (Figure 3). 

— Step 4- Using the AFS models built in step one, the coefficients can be 
calculated over the whole frequency range (Figure 4). These coefficients, 
together with the orthonormal multinomials, are stored in a database for 
use during extraction afterwards. 



3 Example 

The automated modeling technique was used to generate analytical circuit 
models for all sub-parts {transmission line, open end, slot coupler, step in 
width, corner- fed patch) of a slot- coupled micro strip -fed patch antenna struc- 
ture (figure 5). This modeling step is a one-time, up-front time investment. 
A double-sided duroid substrate was used (thickness = 31 mil & 15 mil, 6r 
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Fig. 5. Slot-coupled microstrip-fed patch antenna structure 



= 2.33, tg J - 0.0012). 

First, parameterized circuit models were built for all substructures of the 
circuit. For example, the corner- fed patch (figure 6) circuit model was built 
over the following parameter range (tabel 1): 



variable 


min 


max 


-^patch 


320 mil 


400 mil 


^Ffeed 


5 mil 


30 mil 


/ 


5 GHz 


15 GHz 



Tab.l Parameter ranges of corner- fed patch. 

The automated modeling tool (ADS Model Composer) selected 25 data points 
(= discrete layouts) in an adaptive way, and grouped all S-parameter data 
all in one global, compact, analytical model. ADS Momentum was used as 
planar EM simulator [7]. The desired accuracy level was set to 55 dB. In 
figure 7, the refiection coefficient Sn of the corner-fed patch is shown as a 
function of frequency and width. 

Then, the parameterized circuit models were used to simulate the overall an- 
tenna structure (figure 5). Figure 8 shows Sn simulated with Momentum, 
and with the new analytical circuit models for all sub-components (divide 
and conquer approach). Both results correspond very well. However, the sim- 
ulations based on the circuit models easily allow optimization and tuning, 
and took only a fraction of the time of the full wave simulation (2 s versus 96 
min on a 450 MHz Pentium II). 

4 Conclusions 

An advanced modeling technique was presented for building parameterized 
models for general passive microwave and RF structures. The models are 
based on fullwave EM simulations, and have a userdefined accuracy. Once 
generated, the analytical models can be grouped in a library, and incorpo- 
rated in an EDA tool where they can be used for simulation, design and 
optimization purposes. A patch antenna example was given to illustrate the 
technique. The results based on the parameterized models correspond very 
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Fig. 6. Layout of corner-fed patch 




Fig. 7. Reflection coefficient Sn of corner- fed patch (Wfeed = 8 mil) 




Fig. 8. Reflection coefficient Sn of slot-coupled microstrip-fed patch antenna 
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well with the global full wave simulations. However, the time required for a 
simulation using the compact analytical circuit models was only a fraction of 
the time required for a global full wave simulation. 



References 

1. Chaki S., Aono S., Andoh N., Sasaki Y., Tanino N., Ishihara O.: Experimental 
Study on Spiral Inductors, Proceedings of the IEEE Symposium on Microwave 
Theory and Techniques, (1995) pp. 753-756. 

2. Liang JF., Zaki K. A.: CAD of Microwave Junctions by Polynomial Curve 
Fitting, Proceedings of the IEEE Symposium on Microwave Theory and Tech- 
niques, (1993) pp. 451-454. 

3. De Geest J., Dhaene T., Fache N., De Zutter D.: Adaptive CAD-Model Building 
Algorithm for General Planar Microwave Structures, IEEE Transactions on 
Microwave Theory and Techniques, vol. 47, no. 9, (1999) pp. 1801-1809. 

4. Dhaene T., De Geest J., De Zutter D.: EM-based Multidimensional Parameter- 
ized Modeling of General Passive Planar Components, Proceedings of the IEEE 
Symposium on Microwave Theory and Techniques, Vol. 3, (2001) pp. 1745- 
1748. 

5. Dhaene, T., Ureel, J., Fache, N., De Zutter, D.: Adaptive Frequency Sam- 
pling Algorithm for Fast and Accurate S-parameter Modeling of General Pla- 
nar Structures, Proceedings of the IEEE Symposium on Microwave Theory and 
Techniques (1995) 

6. Beyer, U., and Smieja, F.: Data Exploration with Reflective Adaptive Models, 
Computational Statistics and Data Analysis, vol. 22, pp. 193-211, 1996. 

7. Momentum software, Agilent EEsof Comms EDA, Agilent Technologies, Santa 
Rosa, CA. 




Simulation of Magnetic Circuits Including 
Hysteresis Nonlinearity 



Sinan Giingdr and Saffet Altay 

Istanbul Technical University, Dept, of Electrical Eng., 80626 Istanbul - Turkey 

Abstract. Magnetic circuits can be represented with a topological dual circuit. 
In the dual circuit, flux paths are modelled by hysteretic permeances instead of 
reluctances. Hysteresis effect is taken into account by using the Jiles- Atherton (JA) 
approach. In addition, iron losses due to eddy current are also included to the 
model. Comparison of simulated results with the experimental results from a core 
type transformer demonstrates the capability of the proposed method. 



1 Introduction 

Magnetic circuits can be represented with a topological dual circuit [1]. By 
choosing the time derivative of flux {d(f)/dt) as potential quantity and the 
mmf drop {v) as flow quantity, the flux paths are modeled by permeances. 
The comparison of the reluctance and permeance networks is summarized in 
Table 1. The mesh equations of the reluctance network are written according 
to Ampere’s law. The total currents enclosed by a mesh, ^ is considered as 
the mmf source and the mmf drop across a flux path is defined a,s v = HI, 
where H is the magnetic field strength, / is the length of the flux path. Flux 
paths store magnetic energy and also have losses. Magnetic energy storage 
property is not directly modelled with reluctance network. In [2], a permeance 
network for switched reluctance machines is given without considering the 
magnetic hysteresis. In this work, the method is expanded to include the 
hysteresis nonlinearity into the model. Iron losses due to eddy current are 
also separately modelled. Network equations of permeance network are a set 
of differential equations which are to be integrated to determine the fluxes of 
network element. 

Table 1. Comparison of reluctance and permeance networks 



{v = HI: 


mmf drop, 0 : flux. 


i : current) 
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Fig. 1. Topological construction of permeance network for a core type trans- 
former. (a) Transformer and original graph of the assumed flux path, (b) 
Dual graph, (c) Permeance network. 



2 Network Topology and Dual Networks 

A network is characterized by its topological graph. By means of a tree chosen 
on the directed graph of network, fundamental loops and fundamental cut- 
sets can be constructed. By an appropriate numbering of the graph branches, 
the fundamental loop matrix B and the fundamental cut-set matrix C can 
be written in the following form: 

B = [1,|H] C = [-H^|le] (1) 

where t is the number of tree element, and I is the number of link elements. 

A network graph is deflned as a dual graph to the another if its node 
matrix N is identical to the mesh matrix M of the concerning network and 
vice versa. 

Ndual — M^orig. h/Tdual — Norig, (2) 

3 Magnetic Circuit Modeling 

By means of a network tree on the dual graph of the flux paths (Fig.l), the 
fundamental mesh and cut-set equations can be written as the follow: 

[ 1/ I H ] —[(/)!,...,(/)/, (/>/+! , . . . , = 0 



(3) 
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Fig. 2. Coupling to Electrical Network. Voltage dependent current sources represent 
dynamic iron losses. 



[-£[■^ 114 ] [vi,..., Vi, Vi+i,..., vi+tV ^Ei (4) 

where Vi are fluxes and mmf drops, i is the current vector of windings and 
E is the excitation matrix. These can be furthermore reduced to the following 
equation 



[ Ad(i) + HAd(t)H'^ ^dt^^^’ “ ‘ 






(5) 



where 



^d(i) 



diag( 



d(f>i 
dvi ' 



d(p2 

dv2 ’ 



d(f>i ' 
dvi)' 



and 



Ad(t) = diag( 



d(pi+i d<f>i+2 d(f>i+t 
dvi+i ’ dvi+2 ’ ” ’ ’ dvi+t 



) 



are the matrices of the differential permeances. 

The integration of (5) yields the mmf drops of the link elements. Then 
the mmf drops of the tree elements are calculated from following equation: 



h+i , - • - , Vi+tf’ = , . . . , Vi]'^ + Ei 



( 6 ) 



By considering B = /io{H M), the differential permeances in (5) can be 
approximated as 



A, = kA{l+-^) 



(7) 
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Fig. 3. Modelling of iron losses, (a) Hysteresis and dynamic iron losses for sinusoidal 
flux variation, (b) Equivalent circuit of the hysteresis and dynamic losses. 



where the derivative dM/dH is calculated by using the Jiles- Atherton 
hysteresis model [3]. The increment of the fluxes is calculated: 

A(j>i=AdiAvi ( 8 ) 

For kth integration step is written: 

<j>\ = (9) 



3.1 Coupling to Electrical Network 

The electrical equations of the windings are 

U = Ri+^[V’l,...,V’n]^ (10) 

where R is the diagonal matrix of the winding resistances, n is the number of 
windings. The flux linkages of the windings are written in terms of fluxes 
(l>i' 

=k^[(f)i,...,(f>i+tf ( 11 ) 

As shown in Fig. 2 the permeance network can be coupled to the electrical 
network via ideal transformers. Equation 5 and 10 are to be solved together. 
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Fig. 4. Total current ih obtained from measurement results and the magnetization 
current im calculated using JA model. 



Because of mmf drop has dimension of current, afterwards it will be repre- 
sented with the same symbol as the current i. 

3.2 Iron Losses 

Iron losses which dissipate as heat in magnetic circuit material are split in 
magnetization hysteresis losses, associated with the pure magnetic hysteresis 
loop, and the dynamic iron losses due to local eddy currents. Hysteresis and 
dynamic iron losses all contribute to the width of the hysteresis loop (Fig. 3). 
The area of the magnetization hysteresis loop (f)h - im is proportional to the 
magnetization hysteresis loss: 

Pm — km ^ imd(f>h ( 1 ^) 

The hysteresis loss Pm is modelled with a hysteretic permeance. For the 
mathematical representation of the hysteresis loop, the JA approach is used. 
The dynamic iron losses themselves are distinguished between classical eddy 
current losses Pe and excess losses Px^ Classical eddy current losses are pro- 
portional to the square of the rate of change in flux density. 




however excess losses are expressed as follows [4] 
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Fig. 5. Measured hysteresis loops and the commutation curve (trajectory of loop 
tips) at f = 50 Hz. 



The parameter ke is calculated as 



kp = 



12p 



(15) 



where I is the length of the flux path, d is the thickness of lamination, and p 
is the resistivity of the material. The excess losses depend on speciflc mate- 
rial parameters which are not supplied by manufacturers [5], therefore kx is 
determined from the experimental loss data. Eddy current and excess losses 
are modelled with a voltage dependent current source connected in parallel 
to the permeance of the corresponding flux path. Then the current due to 
the excess losses is written: 



'^x — '^h '^m ( 1 ^) 

Fig. 4 shows the current ih obtained from measurement results and the mag- 
netization current im calculated using JA model. 



4 Magnetization Hysteresis Modeling 

Experimental results (see Sect. 5) show that the commutation curves at differ- 
ent measurement frequencies are coincident. On the commutation curve the 
derivative d(j)/dt and the mmf drop for dynamic iron losses are zero. There- 
fore, the upper increasing part of the magnetization hysteresis loop is on the 
commutation curve (Fig. 6). For the separation of the magnetization hystere- 
sis current the magnetization hysteresis loops (ph-im are modelled by JA 
approach. In JA hysteresis model, the relationship between magnetization M 
and magnetic held H is described as follow: 

dM _ 1 Man - M C dMan 

dH ~ (1 + c)kS- a{Man - M) (1 + c) dH 



( 17 ) 
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Magnetic Field H [kA/m] 

Fig. 6. Hysteresis modeling with JA approach: The pure hysteresis loop modelled 
by JA approach (inner loop), the measured hysteresis loop with dynamic iron losses 
and the commutation curve, / = 50 Hz. 



where 



Man = Ms 



coth 



H + aM 
a 



a 

H + aM 



(18) 



is the Langevin anhysteretic curve. The symbol 6 indicates a directional pa- 
rameter equal to -|-1 or —1 for the ascending or the descending branch of 
hysteresis loop. 

The parameters of JA model are so determined that the loop tips are on 
the commutation curve and this parameters for the magnetic core material 
of the transformer used in experiments are identified as follow [4]: Mg = 
1.225 X 10^ A/m, a = 425 A/m, k 36 A/m, a = 1.0 x 10-^ c = 0.2. 



5 Experimental Results and Simulation 

A core type transformer is supplied from a synchronous generator with vari- 
able voltage and frequency. The fiux level is kept constant at different values 
to limit the current at low frequencies {U / f = constant). The current is 
measured on the primary winding which is supplied and the voltage on the 
secondary which is open. The measured voltage and current are sampled by 
means of a two channel digital oscilloscope with 20 MHz bandwidth and 8 bit 
resolution. The (t>h-'ih loops of a core type transformer are calculated from 
measured voltage and currents on the terminals of the transformer (Fig. 7). 
Simulations are carried out by solving (5) and (10). The simulation results 
are compared with the measured results (Fig. 7). 
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MMF drop i [kA] 



Fig. 7. Hysteresis loops in the transformer core (Simulation result is only given for 
outermost loop), f = bO Hz. 



6 Conclusions 

The hysteresis effect in magnetic circuit of a core type transformer is mod- 
elled by using a dual magnetic equivalent circuit. The proposed equivalent 
circuit which consists of permeances allows the modelling the magnetization 
hysteresis loss and dynamic iron losses separately. For the modelling of the 
magnetization hysteresis, the JA approach is used. The simulation results 
show that the proposed method can be satisfactorily used to take into ac- 
count the hysteresis effect and iron loses in magnetic circuits. The inaccuracy 
in simulation results which occurs at low flux density levels under 0.2 T can 
be reduced by using a modified JA model. 
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Abstract. The field distribution at the ports of the transmission line structure is 
computed by applying Maxwell’s equations to the structure. Assuming longitudinal 
homogeneity an eigenvalue problem can be derived, whose solutions correspond to 
the propagation constants of the modes. The nonsymmetric sparse system matrix 
is complex in the presence of losses and Perfectly Matched Layer. The propagation 
constants are found solving a sequence of eigenvalue problems of modified matrices 
with the aid of the invert mode of the Arnoldi method. Using coarse and fine grids, 
and a new parallel sparse linear solver, the method, first developed for microwave 
structures, can be applied also to high dimensional problems of optoelectronics. 



1 Introduction 

The fields of applications are mobile communications, radio links, automobile 
radar systems, optical communications and material processing. The commer- 
cial applications of microwave circuits cover the frequency range between 1 
GHz and about 100 GHz, special applications in radioastronomy use even 
higher frequencies up to 1 THz. For optoelectronic devices frequencies about 
several hundred THz are common. 

Basic elements of the structures are their transmission lines, whose prop- 
agation behavior has to be determined accurately. The propagation behavior 
of the transmission lines can be calculated by applying Maxwellian equations 
to the infinitely long homogeneous transmission line structure and solving an 
eigenvalue problem [1]. 

Only a few modes of smallest attenuation are able to propagate and have 
to be taken into consideration. Using a conformal mapping between the plane 
of propagation constants and the plane of eigenvalues the task is to compute 
all eigenmodes in a region, bounded by two parabolas. The region is covered 
by a number of overlapping circles. The eigenmodes in these circles are found 
solving a sequence of eigenvalue problems of modified matrices [2] with the 
aid of the invert mode of the Arnoldi iteration using shifts. 

For numerical treatment, the computational domain has to be truncated 
by electric or magnetic walls or by a so-called absorbing boundary condition 
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simulating open space. A very efficient formulation for the latter case is the 
Perfectly Matched Layer (PML) [3]. Introducing the complex, anisotropic 
material PML leads to an increased computational time. 

Due to the fact, that only small fractions of a microwave circuit can be 
simulated, the pressure to larger problem sizes is evident. Especially, the ap- 
plication of the method for optoelectronic devices requires new strategies to 
reduce the numerical effort and storage requirement. The computation of 
large cross sections combined with an extension of specific material layers 
in the sub /xm-range yields high dimensional problems. Additionally, due to 
the high wavenumber in semiconductor lasers the region containing poten- 
tial propagating modes grows substantially. That means, a significant higher 
number of eigenvalue problems have to be solved. To reduce the execution 
times, in a first step the problem is solved using a coarse grid in order to find 
approximately the locations of the interesting propagation constants. The ac- 
curate modes are calculated in a second step for an essentially reduced region 
using a fine grid. In addition, the method is optimized reducing the storage 
requirement and the computing times applying a new linear sparse solver, 
that can be used serial or parallel. 



2 Boundary Value Problem 



We start from a three-dimensional structure. The structure under investiga- 
tion can be described as an interconnection of infinitely long transmission 
lines. The junction, the so-called discontinuity, may have an arbitrary struc- 
ture. The transmission lines have to be longitudinal homogeneous. Ports are 
defined on the transmission lines. A three-dimensional boundary value prob- 
lem can be formulated using the integral form of Maxwell’s equations in the 
frequency domain in order to compute the electromagnetic field: 




H ‘ds 



= [ joj[e]E-df2, 

J Q 

E ' ds — — / • c?i7, 

Jq 



D = [e]E, B = [/x]if, [e] = diag(ea,,e^,C;,) 



(f {[e]E)^dO = 0, 
f {[lAH)-df2 = 0, 

un 

[/i] = 



( 1 ) 

( 2 ) 

( 3 ) 



In the left-hand sides of formulae (1) and (2) i? is an open surface surrounded 
by a closed contour while in the right-hand sides of (1) and (2) Ui? is 
a closed surface with an interior volume. The direction of the element ds of 
the contour dO is such that when a right-handed screw is turned in that 
direction, it will advance in the direction of the vector element dO. 

The transverse electric mode fields at the ports are the solutions of an 
eigenvalue problem for the transmission lines. All other parts of the surface 
of the computation domain are assumed to be an electric or a magnetic wall. 
The PML’s are filled with an artificial material with complex anisotropic 
material properties. Therefore, the quantities are diagonal complex tensors. 
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3 Maxwellian Grid Equations 

The Maxwellian equations are discretized using staggered nonequidistant 
rectangular grids. Using the Finite Integration Technique (FIT) [4], [5], [6] 
with the lowest order integration formulae 

(f f - ds ^^{±fiSi), [ f-dO^fQ (4) 

JdQ Jq 

equations (1), (2) are transformed into a set of Maxwellian grid equations 

A^Dsif_,b = juj€o/J^oDA,e, BDA,e = 0, (5) 

ADse^ -jujDA^b, BDA^b = ^. (6) 

The vectors e and b contain the components of the electric field intensity and 
the magnetic fiux density of the elementary cells, respectively. The diagonal 
matrices Da^, Ds, and Da^ contain the information on cell dimension 

and material. A, B, and B are sparse. 

Eliminating the components of the magnetic fiux density from the two 
equations of the left-hand side of (5), (6) we get the system of linear algebraic 
equations 

{A^Ds/f.D'^ADs - klDA,)e = 0, ko = w^eoMo, (7) 

which have to be solved using the boundary conditions, ko is the wavenumber 
in vacuum. 

4 Eigenvalue Problem 

The field distribution at the ports is computed assuming longitudinal homo- 
geneity for the transmission line structure. Thus, any field can be expanded 
into a sum of so-called modal fields which vary exponentially in the longitu- 
dinal direction 

E(x,y,z) = E{x,y)e^^'^^\ ( 8 ) 

A substitution of ansatz (8) into the system of linear algebraic equations (7) 
and the elimination of the longitudinal electric field intensity components by 
means of the electric-field divergence equation BDa^g = 0 (see (5)) gives an 
eigenvalue problem 

Ce — ye, 7 — —As\v?{hkz). (9) 

e consists of components of the discretized eigenfunctions E_. 2h is the length 
of an elementary cell in 2 ^-direction. The sparse matrix C is in general non- 
symmetric complex. The order of C is n = ^rixTiy — UxTiy is the number 
of elementary cells at the port. The size nt depends on the number of cells 
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with perfectly conducting material. The relation between the propagation 
constants kz and the eigenvalues 7 is nonlinear, and can be expressed as 

We are interested only in a few modes with the smallest attenuation. These 
are the modes with the smallest magnitude of imaginary part, but possibly 
with large real part of their propagation constant. The computation of all 
eigenvalues in order to find a few propagation constants must be avoided 
for the high-dimensional problem. For numerical treatment we have to limit 
the search for propagation constants by a maximum value kf of their real 
part. This kf value depends on the highest permittivity [e] and permeability 
[/i] values of the waveguide, though regions with metallic or PML filling are 
ignored, see [2]. Using the limited kf and a preset maximum value am of 
the imaginary part of the propagation constants the region containing the 
interesting constants is defined as a rectangle F bounded by the lines 

P — ±kf and a = am- (11) 

In an additional step all computed modes that are related to the PML bound- 
ary are neglected, using the power part criterion given with [8]. We can use 
the approximation sin(x) ^ x in (9) if we choose h to be small enough, which 
is necessary anyway to get small discretization errors: 

7 — —4sin^ {hkz) ^ —4{hkz)^ u + jv. (12) 

With aid of the approximation (12) we get a conformal mapping between 
the plane of eigenvalues (7-plane) and the plane of propagation constants 
(A:^ -plane, see (10)): 

u = — a^), V = Sh^a/3. (13) 

Using this mapping the rectangle F of the -plane is transformed into a 
region F of the 7-plane bounded by the two parabolas 

V = ±4hkf^Ju -h 4K^kf^ and v — ±4hamV + 4h^am^- (14) 

That means, we have to find all eigenvalues of the region bounded by the 
parabolas. 

5 Computation of Eigenmodes 

We need an algorithm that computes just a few selected eigenvalues and 
eigenvectors of a complex sparse matrix. A state-of-the-art algorithm for such 
problems is the Arnoldi method [9], [10]. In general the Arnold! method con- 
verges for our problem only using the invert mode and looking for eigenvalues 
of largest magnitude. Thus, a simple way to find the eigenvalues located in 
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the region F would be to look for all eigenvalues of smallest magnitude, 
which are located in a circle centered on the origin and covering the region 
F. Caused by the high wavenumber kf^ the number of eigenvalues located in 
this circle is in general too large for a feasible computation using an iterative 
method. We can solve this problem covering the region F with s > 1 circles 
Ci, i = 1(1)5, centered on the u-axis and calculating the eigenvalues located 
in these circles. That is done in the following way. s points 

PiiPi,0!m), i = l(l)s, = (is = kf, with p = \/Zam (15) 

are defined on the interval [0,A^/] of the line a = am- The distance between 
the points need not be equidistant and is controlled as shown below. Even the 
meaning of the distance j3 is discussed below. The points Pi are transformed 
into the points Pj of the 7-plane. They are located on the parabola ((14), 
right formula) . The s circles Ci of the 7-plane 



{u-nii)^ =ri‘^, ri = ^/ (SR(P«) - rriiY -h (5(Pi))^, i = 1(1)5, (16) 



with 






are centered on the u-axis, covering the region bounded by the parabolas. 

In order to find all eigenvalues, located in the circle / points Qj are 
defined on the periphery of Ci. The matrix C is extended by the diagonal 
matrix Q. The diagonal elements of Q are the I complex elements Qj: 



C=(^c\ Q = diag(Qi,...,Qi). 



The s eigenvalue problems 



{C - rriil)e = {-f - rtii)e, i = l(l)s, 



(18) 



(19) 



are solved with the aid of the implicitly restarted Arnold! method using the 
invert mode. The eigenvalue problems can be solved separably. The number 
m of eigenvalues to be computed for this circle must be I on the first call 
to the Arnold! procedure. The main idea is to raise m by I for so long until 
at least one value Qj was found. But, since m n {n order of matrix C) 
for a feasible computation, one has to restrict the number m of required 
eigenvalues by rumax- If ^ exceeds rrimax in case of i > 2, we insert a point 
Pi-i. between and Pi and restart with m — 1. The same procedure is 
used if a given number Vmax of iterations in the Arnold! method is exceeded. 
If the condition 



( 20 ) 
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cannot be fulfilled, we restart with new parameters rrimax^ ^max and possibly 
am- If all eigenvalues Qj are found in case of m > /, we look for the eigenvalue 
Imax of largest magnitude. If ^\‘^max\ > a new circle Ci of radius 
with the same center as Ci is defined. The left intersection point of this 
circle with the parabola ((14), right formula) is used as new point and 
AP = 5ft(Pi) — 5J(Pi_i) as distance for the next step, m is reduced by the 
number of eigenvalues with a/[^ > for the next circle. 

Separating the new values on each eigenvalue problem z, we are sure to 
have found all eigenvalues which are located in the corresponding circles Ci. 
Applying the mapping (13) the circles Ci (see (16)) are transformed into 
Cassinian curves Ci 



(/?' + 



rrii 

2h2 






rl_ 

16/i4 



771,2 

16/l4’ 



(21) 



which cover the rectangle F containing all desired propagation constants. 
Propagation constants outside of F and PML-Modes are eliminated. The 
Cassinian curves Ci, i ■= 2(l)s, consist of two separated ovals, if ri < rui. 
Using P as minimum distance between the origin and Pi (see (15)) other 
shapes of Cassinian curves (e.g. waisted ovals), which would lead to higher 
execution times, are avoided. 



6 Optoelectronic Devices 

The maximum cell size of discretization should be less than A/ 10, where A 
denotes the wavelength in the material with the highest Ji(e). Essentially 
finer grids have to be used for regions of the circuit with highly variable 
electric fields. That means, the problems become high dimensional, and only 
small fractions of a circuit can be simulated. Especially the application of the 
method to optoelectronic devices requires new strategies. The dimension of 
the eigenvalue problem to be solved increases essentially in this case caused by 
the short wavelength. In addition, due to the high wavenumber in optoelec- 
tronic devices the length of the rectangle F containing potential propagation 
constants grows substantially. That means, we have to calculate a significant 
higher number of eigenvalue problems. Due to electric and magnetic walls be- 
hind the PML undesired modes are generated inside the computation domain. 
The non physical modes can be eliminated by examining the eigenfunctions. 
Anyway, the number of eigenmodes to be calculated increases caused by the 
shifted modes. Due to the significant difference between the magnitude of the 
real and imaginary part of the propagation constant a high computational 
accuracy has to be required. To overcome these problems two strategies have 
been realized. 

(1) To reduce the execution times, in a first step the problem is solved using a 
coarse grid with lower accuracy requirements in order to find approximately 
the locations of the interesting propagation constants. Finally the modes are 
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calculated in a second step for an essentially reduced region using a fine grid, 
that fulfills higher accuracy requirements. 

(2) Because in general the Arnold! method does not converge using the regu- 
lar mode for our eigenvalue problem the invert mode with shifting (see (19)) is 
applied. A time and memory consuming system of linear algebraic equations 
has to be solved on each iteration step in this case. The storage requirement 
and the computing times could be reduced substantially, applying the new 
linear sparse solver PARDISO [11], [12], rather than the formerly used UMF- 
PACK [13]. The fill in is reduced approximately by a factor of 4.75. Moreover, 
the dynamic memory allocation of PARDISO allows to diminish the mem- 
ory requirements. The computing times for the numerical factorization and 
forward and backward solve are reduced on the average by a factor of 15 
and 4 for our problem, respectively. The algorithm is split into three phases: 
symbolic factorization, numerical factorization, and forward and backward 
solve. The symbolic factorization can be used for all modified matrices of our 
problem. The numerical factorization has to be repeated for every new shift. 
The typical ratio of factorization time to solution time on a single CPU can 
be used to define T^max in the subinterval control process (see section 5). This 
ratio amounts on the average 20. That means, the costs using Vmax — bO 
Arnold! iterations for the computation of m eigenmodes in a circle Ci defined 
by the points Pi-\,Pi are comparable with the costs, defined by the costs for 
two circles defined by the points and using Vniax — 20 

iterations. On the other hand the time is lost, interrupting the computation 
of m eigenmodes after Prnax = bO iterations and starting a new iteration pro- 
cess for two reduced circles. Thus, we use a greater i^max’ Moreover, due to 
the significant difference between the length and the height of the rectangu- 
lar region F in the -plane we have to solve a large number s of eigenvalue 
problems (see section 7). In order to diminish this number we use Cassinian 
curves with relatively large diameters. That means, a number of non desired 
eigenvalues outside of the area F has to be calculated. In general the compu- 
tation of a large number m of eigenvalues in one circle needs more iterations 
than a small number. 

The s eigenvalue problems (19) could be solved independently and in 
parallel using s processors and single CPU mode of PARDISO. But that 
means, the amount of memory increases nearly by a factor of 5, and the 
maximum problem size which could be managed is reduced. Contrary to 
that the parallel CPU mode of PARDISO provides an additional possibility to 
reduce the computing times for high dimensional problems on shared memory 
multiprocessors without essential additional memory requirements. 



7 Laser Application 

As an example we have calculated the guided mode of an optoelectronic de- 
vice. A so called self aligned stripe (SAS) laser is investigated, see Figure 1. 
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This laser structure contains an additional, so called antiguided layer (marked 
with 6x — 12 — 2 * 0.1 in Figure 1) outside the emitting stripe (marked with 
€x = 11.3 + i * 0.05 in Figure 1). This high power laser diode excites only the 
fundamental mode, the active region is useful for wavelengths shorter than 
800 nm. The frequency is fixed to 299.7925 * 10^^ Hz. In our eigenmode com- 
putation of the laser structure a graded mesh of 283 times 345 elementary 
cells, including 10-cell PML regions, is used as a fine grid. The maximum cell 
size amounts A/12 = 25nm, where A denotes the wavelength in the material 
with the highest 5ft(e). The minimum cell size is Inm. Maximum cell size 
is scaled down exponentially in the vertical direction near the 100 nm zones 
and in the horizontal direction near the material cut 118 and 119 (see Fig- 
ure 1). The dimension of the eigenvalue problem is 192423. The eigenvalues 
and eigenvectors have been solved with the relative accuracy tol = 10“^^, 
and with rrimax = 16, / = 5 (see section 5). 84 Cassinian curves have been 
used to cover the long small region of the complex plane (a^ = 2500 m~^, 
kf = 21 765 592 m~^, see (11)) containing potential guided modes. A maxi- 
mum number i^max = 200 of Arnoldi iteration has been used. The total com- 
putational time amounts approximately 3h and 23 minutes using a Compaq 
Professional Workstation with processor XPIOOO alpha 667 MHz. 



119 




Fig. 1. Laser (amplifier) 
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One guided mode according to the lasers fundamental mode, was found. 
The computed complex propagation constant is given by kz = 20 817578 + 
j 1 488. 

A graded mesh of 121 times 127 elementary cells is used as a coarse grid. 
The maximum cell size amounts 80nm, and the minimum cell size 4nm. The 
dimension of the eigenvalue problem is 29 625. The total computational time 
amounts approximately 19 minutes using the relative accuracy tol = 10“^. 
The circle that contains the guided mode is known after this step. The time 
to find the accurate value kz using the fine grid amounts only 142 s. Thus, 
the computational time is reduced by a factor of 1 /9 for the given structure. 
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Abstract. A method to optimize delay and power dissipation in on-chip inter- 
connect is reported. Propagation delay can be represented by the dominant time 
constant of the corresponding RC circuit or as a p-q% delay [1]. The optimization 
problem is formulated as a sequence of semi-definite programming problems. The 
method is applied to interconnect with inclusion of the fringing capacitance and 
capacitive coupling between wires. Shapes of single wires and models of real-life bus 
designs are optimized. It is shown that the optimal wire shape depends on the cho- 
sen delay metric and that it can be described accurately with a linear model. The 
differences between wire sizing and wire tapering are discussed. The importance of 
capacitive coupling in the optimization of multi-wire buses is demonstrated. Future 
extensions of the approach are discussed. 



1 Introduction 

As process technology scales into deep submicron dimensions, interconnect 
delay increasingly dominates over gate delay. This is because the trend of 
process technology scaling has led to the increase of the resistance per unit 
length of the interconnect, while the capacitance per unit length remains 
approximately constant [2]. The gate delay, on the other hand, decreases. In 
addition, whereas the bottom capacitance reduces, the capacitive coupling 
between neighboring wires is becoming increasingly important, because the 
wire spacings decrease [3]. Figure 1 depicts the capacitances in interconnect 
[ 11 - 




Fig. 1. Interconnect capacitances. The symbols Cb, C'eb, and Cec denote the bot- 
tom, fringing, and coupling capacitances, respectively [1]. 
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Therefore, deep submicron design shows a growing interest in optimiza- 
tion techniques that size wires and/or insert and size buffers [3,4]. Many of 
these methods apply the Elmore delay model to determine the signal propa- 
gation delay [1,4]. Unfortunately, Elmore delay ignores the resistive shielding 
of downstream capacitance and is therefore unacceptably inaccurate in some 
cases. Furthermore, the application of Elmore delay is restricted to resistor- 
capacitor (RC) trees, which means that non-grounded capacitances cannot 
be handled. This restriction inhibits the inclusion of coupling capacitances 
between neighboring wires [5]. Capacitive coupling, however, has to be incor- 
porated if cross-talk effects on signal delays are to be taken into account. 

An alternative delay metric for RC circuits was proposed by Vanden- 
berghe et al. [5], who used the dominant time constant (Tdom) of the circuit 
as a measure of signal delay. Circuit sizing problems can then be solved with 
a semidefinite programming (SDP) method. Their approach was shown to 
be applicable to general nontree topologies, such as meshes of resistors and 
buses with coupling capacitances between the wires. 

However, problems in which delay is minimized were not considered in 
[5], because these cannot be directly expressed as semidefinite programming 
problems. Instead, examples were studied in which Tdom is constrained. Sec- 
ond, only the largest time constant was taken into account, the effect of the 
other time constants was neglected. Furthermore, the applied wire model 
did not include fringing capacitance, which is a non-negligible component in 
deep-submicron technologies, see Fig. 1. Also, the used values for the model 
parameters were not related to real-life process technologies. 

In the current work, the approach of Vandenberghe et al. [5] is extended to 
include problems in which the propagation delay is minimized. The extended 
method is capable of optimizing Tdom or other delay metrics in which all time 
constants are incorporated. An interconnect model is used in which fringing 
capacitances are included and the applied parameter values correspond to an 
existing 0.18 /im process technology. The approach is used to study optimal 
wire shapes of single wires and the optimization of multi- wire bus models. 
Future extensions of the method are treated. 



2 Delay Optimization 

The approach considers general RC circuits composed of two-terminal resis- 
tors and capacitors, and independent voltage sources. If the branch capacitors 
and conductances in the circuits are nonnegative, the capacitance matrix (C) 
and the conductance matrix (G) describing the circuit are positive semidefi- 
nite. Further, if the branch capacitors and conductances are affine functions 
of some design parameters x E IR’^, the matrices C and G are also: 

C(ic) — Cq T “h ' ■ * “h XjjiGfji , 

G(x) = Go + XiGi + • • • -+- XjjiGrn • 



( 1 ) 

( 2 ) 
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Typically, the parameters x correspond to widths of wire segments and tran- 
sistors and the spacings between wire segments. 

The dominant time constant of the circuit can be expressed as [5], 

Tdom(x) - min {T | T G{x) - C{x) > 0} . (3) 

The inequality means that the left-hand side is a positive semidefinite ma- 
trix. The work in [5] considered problems that can be expressed as SDP 
problems, such as the minimization of area, dynamic power dissipation, or 
bus width, with a linear matrix inequality (LMI) constraining the dominant 
time constant. In the present work problems have been studied that can be 
cast as generalized eigenvalue minimization problems (GEVPs), such as the 
minimization of the dominant time constant, subject to variable bounds, 

minimize T 

subject to T G(x) — C{x) > 0 

<Xi< , (4) 

with variables T and x. Additional constraints on the wire widths and spac- 
ings can be expressed as LMIs also. This is the case, for instance, when a set 
of parallel wires is optimized [5]. 

The most convenient way to solve a GEVP is to solve a sequence of 
SDPs [6]. If T is fixed, the problem of (4) becomes a feasibility problem with 
variables x, which can solved by solving the SDP, 

minimize w 

subject to T G(ic) — C(x) -h w\ > 0 

xf^ <Xi< . (5) 

The constraints in (4) are fulfilled if and only if the solution tc* of (5) is 
non-positive. The GEVP of (4) can therefore be solved by finding the value 
of T for which ic* is equal to zero. 

The approach of (5) is general, i.e., delay metrics other than the dominant 
time constant can be minimized as well. A possible alternative is the p-q% 
propagation delay, D{T), defined as the difference between the times at which 
the response to a step input signal has completed p% and qVo of its transition, 
with q > p [1], The problem is then cast as, 

minimize D{T) 

subject to w*{T) < 0 , (6) 

where w*{T) is the solution of (5) for given T and D{T) is calculated by 
including all time constants of the circuit. In the case of a rising transition, 
the p-q% propagation delay can be expressed as, 

D{T) = min | T, | \vout{t-,T)\ > ^ Vdd, for f > T, | 

- max { Tp I |nout(<; T)\ < ^ Vdd, for i < } , (7) 
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where Vout{tiT) denotes the voltage at the receiver node, which can be com- 
puted from C{x) and G{x) [5], and Vdd is the supply voltage. In the following 
sections, p = 0 and g = 80 will be used. 

In the current work, the SeDuMi toolbox [7] has been applied to solve 
SDPs such as (5). Additionally, a nonlinear constrained optimization algo- 
rithm that was developed recently in our department [8,9] has been used to 
solve (6). 



3 Interconnect Modeling 

Each wire segment i is modeled as a 7r-type RC circuit (a resistance ri between 
two capacitors with value Ci/2 to ground). The resistance and capacitance 
parameters and Ci are functions of the design parameters. If h and wi 
denote the length and the width of the segment, respectively, then, in the 
case of a single wire, the values of ri and Ci are given by. 



Ci {^C\yWi -f- Cf)/^ 5 (9) 

where Rs^ and Cf denote the sheet resistance, the unit area bottom capac- 
itance, and the unit length fringing capacitance, respectively. The fringing 
capacitance has a component that depends on the wire spacing. The most 
accurate results are obtained if the spacing-dependent term of the fringing 
capacitance is added to the coupling capacitance between the wires, which 
also depends on spacing, and the sum of the two is modeled. 

Figure 2 illustrates that the length of the wire segments has only a rela- 
tively weak effect on the accuracy of the model. Tdom is plotted as a function of 
the number of segments for a 10 mm long wire. The curve converges rapidly 




Fig. 2. Effect of the segment length on the accuracy of the computed delay for a 
10 mm long wire. 
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towards an almost constant value. In the present work, a typical segment 
length of 1 mm was used, resulting in an accuracy within 0.1%. 

It is assumed that the signal wires are driven by a driver with a strength 
oi = 100 ft. The capacitive load connected to the receiver end of the wires 
is neglected, which is legitimate for long wires. 

4 Optimization of a Single Wire 

Figure 3 shows wire shapes optimized with respect to Tdom and to D{T) (0- 
80% delay). Minimizing D{T) results in a more blunt wire shape, in which 
the capacitive load at the receiver end is higher, while the resistance of the 
downstream path is lower. The values of D{T) and Tdom are 2.1% lower and 
1.9% higher, respectively, compared to the case where Tdom is minimized. 
The dynamic power dissipated by a signal propagation, expressed as Tdyn = 
l^C(x)l, is 3.4% higher, because a larger capacitance has to be charged. 
Significant differences between the optimization results for the two delay 
metrics are obtained only for very long wires with a high upper bound to the 
wire width. 





(a) 



(b) 



Fig. 3. Optimal wire shapes resulting from the minimization of (a) Tdom and (b) 
D{T). Signals are propagated from left to right. 



Theoretical studies [10,11] have resulted in expressions for optimal wire 
shapes with respect to (Elmore) delay. Our method provides a tool to opti- 
mize wire shapes and to compare to results with analytic models. We inves- 
tigated a linear model for the wire width. 



w{z) = min (Wmax, max (Wmin, Wo + Kz)) 

T 

Wq — CLq -|- d\ljyf — h UsU/max 
itd 



— ^0 + T 1” 



hz 

Tw Td 






^ ^4^max ^ 



( 10 ) 



where 2 : denotes the position along the wire, is the wire length, iCmax is the 
upper bound to the width, and denotes the driver strength, respectively. 
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The parameters ai and bi were determined by least-squares fits to the optimal 
wire shapes for various combinations of Lw, sind values. It was found 

that the delay and dynamical power dissipation computed from the linear 
model (10) agreed within 1% with results for the optimized wire shapes, 
for practical ranges of ^max, and R^, This shows that the complicated 
expressions of [10,11] are not necessary to accurately model wire shapes. 

Figure 4 shows that wire tapering, in which the width of each wire seg- 
ment is varied independently, offers only a relatively small advantage over 
wire sizing, in which all segments have equal width. The performance of ta- 
pered and sized wires differs significantly only if the upper bound on the wire 
width, tCmaxy is loose. This observation is in agreement with other studies [2] 
comparing wire sizing and wire tapering. 




(a) 




(b) 



Fig. 4. ( a) Wire tapering (upper drawing) compared with wire sizing (lower draw- 
ing). (b) Performance as a function of the upper bound (wmax) on the width of the 
wire segments. 



5 Optimization of Bus Models 

Our method can also be applied to multi- wire models that are used for real- 
life bus designs. Figure 5 shows the results of the optimization of jD(T) for a 
bus consisting of three parallel wires, where a signal is propagated through the 
middle wire. In Fig. 5(a), the signal is “boosted” by letting the neighboring 
wires carry .the same signal [12]. Cross-talk between the wires then speeds 
up the signal propagation. In Fig. 5(b), it is assumed that the neighboring 
wires switch in the opposite direction. In this case, the capacitive coupling 
between the wires is effectively doubled, due to the Miller effect. While the 
optimal width and shape of the middle wire is approximately the same, the 
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wire spacing is increased at the cost of narrower neighboring wires. For the 
bus model used here, situation (b) results in a 42% increase in propagation 
delay compared to the boosted situation (a). 





(a) 



(b) 



Fig. 5. Optimal wire shapes for a three- wire bus in the case that (a) the central wire 
is boosted by the neighbors and (b) the neighbors switch in the opposite direction. 



6 Future Extensions 

The impact of inserting repeaters in long wires can be increased by optimizing 
the repeater sizes. Because the delay of a repeater depends on the capacitive 
load of the wire connected to its output, interconnect including repeaters can 
be optimized by resizing the repeater after each iteration in the sequence of 
SDPs that is used to solve (4) or (6). Also the optimization of shielding wires 
can be used to further decrease signal delay. 

As in [5], in the present work it is assumed that the signal wire is stimu- 
lated by a step input signal. However, responses to input signals with different 
shapes can be studied as well. Also, the approach can be further extended 
by including optimization with respect to the input delays of the signals in a 
multi- wire bus [13]. 

7 Conclusions 

In the present work, we have demonstrated that the semidefinite program- 
ming (SDP) approach to interconnect sizing, as proposed by Vandenberghe 
et al. [5], can be successfully extended to include the minimization of the 
dominant time constant or more general delay metrics. Using a wire model 
including fringing capacitance and realistic parameter values, the method has 
been applied to study optimal wire sizes and the optimization of a multi- wire 
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bus model. The minimization of either the dominant time constant Tdom or 
the more general 0-80% delay provides different results, in particular for very 
long and wide wires. A simple linear expression for the optimal wire shape 
provides results that agree within 1%. Complicated wire-shape expressions 
reported elsewhere [10,11] are therefore of limited value in practical situa- 
tions. Wire tapering results in shorter delay times than wire sizing, but the 
effect is only significant if the upper bound on the wire width is loose. The 
optimization of a three- wire bus model shows the important effect of capac- 
itive coupling on propagation delay and the optimal wire sizes and spacing. 
In summary, the presented approach offers a general and flexible framework 
for accurate studies on the sizing of various types of interconnect. 
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Abstract. The paper introduces a systematic procedure to derive the expression of 
the Maxwell stress tensor associated with a given expression of the electromagnetic 
energy density. 



1 Introduction 

There exist numerous formulae for the computation of electromagnetic (EM) 
forces. They can be sorted into two distinct families, depending on whether 
they are based on the definition of a Maxwell stress tensor or on the appli- 
cation of the virtual work principle. A look into the literature shows that 
expressions of Maxwell stress tensors (See e.g. [1]) are generally obtained by 
algebraic and differential operations, starting from the Maxwell equations 
and assuming an a priori knowledge of the expression of the force density 
in the medium under consideration. The virtual work principle j on the other 
hand, relies more clearly in theory on the required energy concepts [2-4], but 
the formulae proposed in [5-7] for its implementation are all obtained by a 
roundabout way involving the Jacobian matrix of a mapping at the finite ele- 
ment level. In both cases, the underlying thermodynamic concepts are buried 
into an overwhelming algebra. 



2 The Euclidean case 

The first step towards a thermodynamic analysis of an electromechanical 
system is to define the magnetic and mechanical state variables in such a 
way that they are independent of each other. Whereas it sounds obvious to 
anybody that one can freely modify the magnetic field ^ in a system without 
deforming it, by increasing the imposed currents for instance, it is much 
less clear to imagine how the system can be deformed without modifying 
the magnetic field. One feels indeed that any deformation of the system will 
affect the magnetic field. 

Let first M be the material manifold^ i.e. a continuous set of points each 
representing a material particle of a given electromechanical system. Let 
Ci{M) be the set of all regular curves in M and C 2 {M) be the set of all 

^ Considered in this section as a vector field, not as a differential form. 
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regular surfaces. Let E be the Euclidean space Following [8], the mag- 
netic state variable of the electromechanical system is defined by the magnetic 
flux map 

(j) : C2{M) ^ E (1) 

that associates a real number, the magnetic flux^ to any surface in M. Simi- 
larly, the kinematics of the system is defined by the placement map 

p : M ^ Q C E, (2) 

which associates to any point of M its position in E. 

Although the magnetic fiux map (1) determines completely the fiuxes in 
the system, it does not give the local value of the induction field. The latter 
is a secondary quantity that requires the definition of an interpolation op- 
erator noted b{(j)^p), as it may depend on both p and (f). This is the reason 
why the interpolated induction field is not suitable as a primary variable in 
a thermodynamic representation. The properties of this vector- valued inter- 
polation operator are not trivial. It may involve, for instance, the selection of 
a set of particular facets for the representation of the field, and an accuracy 
and convergence analysis. The interpolation with Whitney facet elements in 
a mesh is an example of such an interpolation tool. Another example is given 
below. A similar interpolation operator, denoted by x{p), is associated with 
the placement map. 

Since the maps and p are independent of each other, they are suitable 
variables for the definition of the energy functional of the system i7. One has 

P^{b{4>,p),x{p),p) (3) 

Jn{p) 

where the energy density p^ depends on the interpolated vector fields b and 

X. 

If the problem is more easily posed in terms of the magnetic field /i, than 
in terms of b, the available thermodynamic state function is the coenergy 
functional 

^{I,P)^ I P^{hil,p),x(p),p) (4) 

J Q{p) 

with 

I :Ci(M)h^E (5) 

the magnetomotive force map^ which associates a real number, the magneto- 
motive force, to any curve in M, and h{I,p) the interpolation operator for 
the magnetic field. 

The definition of forces follows now from the variation of those energy 
functionals, S^{(l),p)\g^^Q or S^{I,p)\^j^Q, and the factorization under the 
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form of a mechanical work monomial (e.g. • 6x, : e, / • Sx, . . . ). If 

total energy functionals are used, the computed forces will be total forces^ i.e. 
of mechanical and electromagnetic nature together. In order to define specif- 
ically electromagnetic forces^ we have to use electromagnetic energy func- 
tionals instead of total energy functionals. The definition of such restricted 
functionals is however not an obvious matter in a magnetostrictive material 
for instance. But, as this is not the topic of this paper, it will be assumed in 
the following that all energy functionals are electromagnetic ones. 

2.1 Application at the local level 

Let us consider as material manifold M a unit cube with coordinates {a, j3^ 7}, 
Fig. 1. Let O = (0, 0, 0), A = {1,0,0), B = (0, 1, 0) and C = (0, 0, 1) be four 
particular points of M. The interpolated field x{p) is defined by the affine 
combination of the placement of those four points 

X — (1 - 0- /3 - 7) p{0) + a p{A) -f ^ p{B) -h 7 p{C). (6) 

This determines in £" a parallelepiped region p{M) = Q oi volume 

V = {r X s) ‘ t = {s X t) ' r = {t X r) ' s, (7) 

where r = p{A) — p{0), s = p{B) — p{0) and t = p{C) — p{0) are three 
linearly independent vectors of E. One can check that they verify 

{r X s) t -i- {s X t) r {t X r) s = VI ( 8 ) 

where I is the identity matrix. Note the use of the dyadic (undotted) product 




Fig. 1. Theoretical setup for the electromechanical coupling in a continuous 
medium. 



Taking the gradient of (6), one gets 'Vx = 1= Vor-hVy^ s + V7 t, and 
after identification with (8), 

y Vo = s X t, y V/? = t X r, y V7 — r x s. (9) 

The parallelepiped i? is deformed by perturbing the placement of the 
points A, B and C. The displacement field in i? and its gradient are 

u = Sx = Sp{0)-l-a6r-\-P6s-\-j5t, Vu = Vo 5 r-f V /3 5s + V7 6t.{10) 
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2.2 Formulation in b 

Let (f)a , (t>i 3 and (j)^ be the fluxes obtained by applying the magnetic flux map 
(1) to the particular parallelogram facets OBC, OCA and OAB of M. If 
the parallelepiped is small enough, the (uniform) induction held in O is by 
deflnition the vector b that verifles the left-hand equation in (11). The right- 
hand equation is the expression of the interpolated induction field h(</>,p), 
which is obtained by inversion of the former equation. 



4^a 




S X t 


r s t 


(11) 


4>I3 




t X r 


b b — y(Pa + 


_ _ 




r X s 







The parallelepiped i? is now deformed by perturbing the placement of 
the point C, i.e. by perturbing the vector t, leaving r and s unchanged. The 
variation of b with fluxes held constant is 

^^\s(f)=0 ~ ~ + 'y^l' 

Once the variation is done, it is allowed to substitute back for b. Using (11) 
and (7), on finds 

= ^{-b {r X s) -St + b- {r X s) 6t} . ( 13 ) 

In case of a non-magnetic material, the energy function ( 3 ) is 

^ = Vg^{b) , /(*>) = (14) 

and its variation with fluxes held constant is 

= ^^b ■ + , 51 /^ = VVu : ^ m (&) ( 15 ) 

with, by using (9), (10) and the property (a • c)(6 • d) = (a b) : (c d), 

V« = :l(r X s) .5* , <^M{b) = -^(bb-^-^iy (16) 

which is the expression of the Maxwell stress tensor [1] associated with the 
expression (14) of the energy density. 

2.3 Formulation in h 

Let /-p. Is and It be the circulations obtained by applying the magnetomotive 
force map (5) to the edges OA, OB and OC of M. If the parallelepiped 
i? is small enough, the (uniform) magnetic field in i? is by definition the 
vector that verifies the left-hand equation in (17). The right-hand equation 
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is the interpolated magnetic field which is obtained by inversion of 

the former equation. 

~Ir 

Is 
It 





r 




s 




t 






. s X t ^ t X r ^ r X s ^ 
h = -^^Ir + —r^Is + ^^h- 



V 



V 



V 



(17) 



The variation of h with currents held constant is 
^Nsi=o — ^ {s X St Ir -h St X r Is} = X s) h ’ St (18) 



by (17) and using the variation of (8). 

Considering again a non-magnetic material, the coenergy function (4) is 

# = y/(h) , = (19) 

and its variation with currents held constant gives 

|h|2 

= V^loh ■ + SV^io'-Y = ■ <^M{h) (20) 

with 

Vu-^{rxs)5t , (jM{h)= (21) 



The classical expression of the Maxwell stress tensor in terms of h is found 
[1], although the intermediary steps were somewhat different. 



3 Magnetic materials 

In case of a permanent magnet material^ the constitutive law and the energy 
density are 

b = iJ,o{h m) , p^{b,m)=^-^ b m ( 22 ) 

2/io 

where the magnetisation vector m does not depend on the magnetic field 
but may in general be a function of the deformation tensor £. In order to 
determine the Maxwell stress tensor associated with this particular expression 
of the energy density, we need to choose the geometrical nature of m. 

If one decides to have the magnetisation, denoted by ra^^\ represented 
by a map C\{M) i-> E, like the magnetomotive force map (5), one finds 
by applying the procedure described in the previous section that the term 
b • will have no contribution to the Maxwell stress tensor, except the 
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one due to the explicit dependence of m with ^ Under this assumption, 
the Maxwell stress tensor of a permanent magnet material is 

= -(bh - (23) 

Mo V ^ / 

since Se = 

If one decides on the contrary to have the magnetisation, denoted by 
represented by a map C 2 {M) IR, like the magnetic flux map (1), one finds 
then 

— 6 b— b • I—b’dem^‘^\{24:) 

1^0 \ 2/io / 

which is different from the former. 

This shows that the distinction between 1— forms and 2— forms, which is 
irrelevant for the expression of the magnetic constitutive law (22), becomes 
essential when the magneto-mechanical coupling is considered. There is how- 
ever no mathematical reason to favour one of these expressions. The first 
one might be better, for instance, when the magnetisation is actually due to 
the presence of microscopic magnetic dipoles, whereas the second one might 
better fit a magnetisation due to microscopic flux carriers (such as Abrikosov 
vortices in HTc Type II superconductors). 

In case of an isotropic reversible magnetic material^ the magnetisation 
writes m = xd^L^) ^ with x magnetic susceptibility. This gives a 
good representation of poly crystalline saturable materials, like iron and non- 
laminated steel. The magnetisation being a function of /i, it is more natural 
in this context to work with the coenergy : 

p\h\ 

^ = Vg^{h) , Q^{h)= / noil + dx. (25) 

^0 

The associated Maxwell stress tensor is 

r\h\ 

cr M {h) = g.o{l + xi\hle)) h h - p^l- go dsx{x,e) x dx. (26) 

Jo 



4 The Riemannian case 

It is worth showing how easier the definition of EM forces is in differential 
geometry terms. Let us consider the case of a non-magnetic material, for 
which the coenergy density is the following function 

= y(/ilft)7T = y\/? hig^^hj tt, (27) 

^ This is due to the fact that the exterior product of a 1— form and a 2— form does 
not involve the metric and is therefore insensitive to deformation. 
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of the 1— form /i, the Hodge operator the volume form tt = dx A dy Adz and 
the metric g, with components gij. Let g be the determinant of the metric 
matrix^ and the g'^^^s the components of its inverse. 

The variation of the coenergy, with the field h held constant, writes simply 



£vp^ 



\£r,h = 0 



(28) 



with V = dtp{t) the velocity field. Thanks to the Lie derivative [9], there is no 
need here to define the magnetomotive map and the interpolated magnetic 
field as distinct concepts. 

Applying the rules of the Lie derivative and using the condition 
£.„ = 0 ^ = (29) 

to eliminate the terms like , one ends up with 



^-P^\£„h=o = hiidte^y^hj TT (30) 

where the tensor dts'^ is related with the time derivative of the strain tensor 
dts by 

dte‘^ = ^£^g~^ , {dts'^y^ = -g^P{dte)pgg'^^ , dts = ^£^g. (31) 

Equation (30) factorizes as 

<^^P^\£^h=o = - {p^ -v + (tm ■■ dts} TT, (32) 

which defines the force density and the Maxwell stress tensor gm associ- 
ated with the expression (27) of the coenergy: 

^ ^ ^nj (oo\ 

Pk ~ 2gdx^^ ’ — Pov 9 niO 9 l^j Q ^ (^^) 



Pl23- 



with 

Similar developments can be done for a 6— formulation. One has 



o^ih) - 






(34) 



and 



l£„6-o \2g dxP dxP ) /io 



\£^b=0 

Equation (35) factorizes as 



£vp"^ 



\£^b=0 



{p^ -v + aM ■ dts} 



hpqidte^Y^g'^^bij tt. (35) 



(36) 



which defines the force density p^ and the Maxwell stress tensor gm associ- 
ated with the expression (34) of the energy 
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Comments The expressions (33) and (37) of the Maxwell stress tensor of 
a non-magnetic material in a general Riemannian metric reduce to 21 and 
16 in case of an Euclidean metric^ gij = Sij. On the other hand, the force 
density exists only where the metric field is not uniform. But, as we have 
seen, the computation of a Maxwell stress tensor by this approach requires 
quite an involved algebra. 

5 Conclusion 

The thermodynamic definition of EM forces in a continuous medium has been 
recalled. In a Riemannian metric, it can be applied directly thanks to the 
Lie derivative. However, in order to facilitate the calculation of the Maxwell 
stress tensor of more complex materials, a formalism expressed in classical 
terms, but yet coordinate-free, has been proposed, which is based on the same 
principles but applies only in the case of an Euclidean metric. In this case, 
the Maxwell stress tensor has been found to be the fundamental expression 
of the local electromechanical coupling in a continuous medium. It can be 
readily used as an applied stress in structural computations. It can also serve 
to compute electromagnetic forces : force densities are obtained by taking the 
divergence of it, nodal forces by multiplying it with the gradient of a nodal 
shape function [7] and resultant forces by integrating it over an enclosing 
surface. According to the approach proposed in this paper, the Maxwell stress 
tensor of a particular medium is not inferred from a postulated expression of 
the force density, which is usually the case, but on the contrary systematically 
derived from the expression of the electromagnetic energy or coenergy density 
of the medium under consideration. The proposed procedure has been applied 
for permanent magnet materials and isotropic saturable magnetic materials. 
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Reduced Order Modelling of RLC-networks 
Using an SVD-Laguerre Based Method 



Pieter Heres^ and Wil Schilders^^ 

^ Eindhoven University of Technology, Department of Mathematics and 
Computing Science, Scientific Computing Group 
^ Philips Research Laboratory, Eindhoven, The Netherlands 



Abstract. With interconnect increasingly contributing to the electrical behaviour 
of integrated circuits, both by higher frequencies and smaller dimensions, it becomes 
increasingly important to incorporate its behaviour into simulations of ICs. This can 
be done rather elegantly by summarizing interconnect behaviour into a compact or 
reduced order model which is then co-simulated with the circuit. A similar approach 
can be used in the case of more conventional printed circuit boards. The SVD- 
Laguerre algorithm proposed by Knockaert and De Zutter [4] can be used for this 
purpose. In this paper, we describe an efficient implementation of the algorithm 
for multiple inputs, and show how the mathematical reduced order models can be 
translated into realizable circuit elements. 



1 Introduction 

To increase their performance, the characteristic dimensions of ICs and printed 
circuit boards (PCBs) are decreased and will decrease even further in the 
future. Higher speed makes the effect of higher frequency modes on the in- 
terconnect more important. Therefore, the analysis of signal propagation on 
the interconnect system is important. However, this requires the solution of 
Maxwell’s equations which is rather demanding from the point of view of 
computation times. In addition, accurate modelling leads to large systems 
which can hardly be used in conventional circuit simulations. 

To be able to work with models for interconnect structures, a technique known 
as reduced order modelling is employed. This class of mathematical tech- 
niques is able to reduce the sizes of models while preserving their essential 
features. Classical techniques in this area are the asymptotic waveform eval- 
uation (AWE) method and the Pade-via-Lanczos (PVL) method. The latter 
is an efficient and robust implementation of the former. Recently, a new re- 
duction method was proposed by Knockaert and De Zutter [4]. We will take 
a closer look at this method and will show how this method can be used to 
make realizable circuits. 

The paper is built up as follows. In section 2, we briefly show how the 
discretized Maxwell equations lead to an RLC model for the interconnect 
system. Then, in section 3, the concept of transfer function is introduced, 
relating the area of reduced order modelling to concepts used in systems and 
control engineering where frequent use is made of state space models. Section 
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4 discusses several points which are of interest when using reduced order 
modelling techniques in the context of ICs and PCBs. In section 5 the SVD- 
Laguerre method is explained, whereas in section 6 the efficient treatment of 
multiple inputs is presented. The translation of the mathematical results to a 
realizable circuit is discussed in section 7. In the last section, some numerical 
results are given. 

2 Discretization procedure 

The modelling of interconnect systems has gradually developed over the years. 
For DC situations interconnect can be modelled as a short, but as losses and 
inductances are becoming more important increased use is made of represen- 
tations using RLCG circuits. It is sufficient to restrict ourselves to the case 
in which the components R, L, G, and C are frequency independent. 

Both ICs and PCBs can be modelled by (large!) RLC-circuits. These models 
can be obtained via a discretization of the Maxwell equations. As an illustra- 
tion of this, we will very briefly review how this is done in [7]. 

To calculate the electromagnetic flelds in an electronic system, the Maxwell 
equations must be solved: 

VxE=-|^ J = crE 

VxH = J+ ^ B = /iH 

V • B = 0 D = eE 

V-D = /9 

After introducing a magnetic vector potential A and an electric scalar po- 
tential (p the system can be rewritten as follows: 



(A -f k^)A = -//J, 

V • (eVip) -f ek^if — — p, 

J = crE = a(— V(p -f icjA), (1) 

V • J — iojp = 0 

with suitable boundary conditions. This system is discretized using a bound- 
ary integral method [7] making frequent use of Green’s functions. The flnal 
discrete system can then be written into a form which is familiar to IC and 
PCB designers: 

(R - iuij)i - py = 0, 

-P^/ + iioMQ = 0 (2) 

- Dg = 0 

Here, the elements of the vector V are the potentials of the elements. The 
vector I consists of the current through the edges. Q contains the weights 
of the surface charge density, therefore its elements are the charges of the 
elements of the circuit. 
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3 Transfer Functions, Approximation and Reduced 
Order Modelling 

It is convenient to rewrite RLC models in terms of a state space formulation. 
Such formulations are of the form 

Cx = Gx + BiU 

y = Bo^x, (3) 

where Bi and Bo are the matrices selecting the input and output, respectively. 
This reformulation can be done both for MNA formulations of circuits, and 
for the discretized system of Maxwell equations as derived in the previous 
section. 

An efficient and commonly used way to solve the state space system is via 
the Laplace transform. Within this methodology, the so-called transfer func- 
tion is introduced. It is the function H(s) giving the direct relation between 
input and output, in the frequency (denoted by s) domain. It is obtained by 
eliminating the state space vector x. The 5-parameter can be considered as 
the complex frequency iuj. For (3) we have: 

H(s) =Bo’’(G + sC)-iBi, (4) 

such that y = H(s)u. A model which approximates the original model, can be 
called accurate if the transfer function of the original model is approximated 
well by the transfer function of the approximating model. 

As can be understood from the procedure summarized in section 2, the 
models obtained for interconnect systems on ICs or PCBs consist of (very) 
large systems of equations. This is not very convenient, and a coupling of 
these large systems with circuit equations is almost out of the question. With 
Reduced Order Modelling the original model is replaced by a model which 
is smaller, but has (approximately) the same properties. There is a danger, 
however, that some essential properties are lost during the mathematical 
procedure. Ideally, these properties should be preserved. 

In our search for a smaller circuit, describing approximately the same 
behaviour, an important issue is the preservation of stability and passivity. 
An RLC-circuit is passive, because it has no active components. Passivity 
is stronger than stability. A stable circuit, can become unstable when non- 
linear components are attached to its terminals. In contrast, a passive circuit 
remains stable under all conditions. 

The behaviour of a circuit and the transfer function are uniquely deter- 
mined by the poles and their associated residue. Poles can be calculated by: 

-1 

a(G-iC)’ 

with cr(G“^C) the eigenvalues of — G“^C. Because the poles are determining 
the behaviour of the system, also the poles can be approximated. This is why 
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methods from the area of eigenvalue approximations are often well-suited 
for these problems also. Examples of this are the Krylov subspace methods, 
PVL and PRIMA. In this paper we consider a new Krylov subspace method 
which is very similar to the others, but with some very attractive properties: 
SVD-Laguerre [4]. 



4 Some Theory Behind the Laguerre Method 



The idea behind the Laguerre method is that the transfer function can be 
expanded in terms of Laguerre functions. We consider the Laguerre functions 
in the 5-domain: 






s + a 



s — a 
5 -f a 



n 

for n = 0, 1, 2, ... 



(5) 



These functions form a uniform bounded orthonormal basis in the frequency 
domain for s — iuj, with ou E (0, oo), for the space 7^2- The transfer function 
can be expanded in terms of these functions: 

Cy oo ^ / \ ^ 

= (G+aC)-^B(^j (6) 

n=0 ^ '' 

Due to a lack of space we are forced to reference to [3], where the derivation 
of this expression can be found. The matrices used in this Laguerre expansion 
can be used to build up Krylov subspaces. An n-dimensional Krylov subspace 
is defined by: 

X:„(b,A) = [b,Ab,...,A"-ib] (7) 

The main part of the Laguerre method consists of building a Krylov subspace, 
with the vector (G -f- aC)“^Bi and matrix (G + o;C)"^(G — aC). 

The columns of the Krylov subspace are made orthogonal. In the original 
article of the SVD-Laguerre method this orthogonalisation is done after all 
columns of the Krylov-space are created. This is done with a Singular Value 
Decomposition (SVD). But in repetitive multiplication with a matrix the vec- 
tors tend to one dominant direction. To avoid numerical artefacts, we propose 
to perform orthogonalisation this during the generation of the columns. The 
system matrices are projected onto this Krylov subspace, spanned by V : 



G = V^GV C = V^CV Bi = V^Bi B„ = V^Bo 



For a single input column Bi, the algorithm can be summarized as follows: 
Solve (G -h aC)vi = Bi 
~ llvill 

for j=l,...,k-l 

Solve (G + aC)t = (G - aC)vj 
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for i = 

hi,j = vf t 

t = t - /ijjVj 

end 

hj+ij = ||t|| 

^ 

end 

G = V^GV C = V^CV Bi = V^Bi = V^Bo 

Although, solving the matrix equation in this equation is quite expensive, it 
has to be done for one choice of a, so we can for instance invest in an LU- 
decomposition, to solve the system efficiently. Further note, that the following 
holds, during the algorithm: (G — aC)“^(G + aC)Vfe_i = where 77 

is a Hessenberg matrix. The small matrix 77^77 can be used to approximate 
the singular values of the matrix (G — o:C)“^(G + aC) and can therefor be 
used in a stopping criterium. 



5 The Laguerre Algorithm for Multiple Input 

If an RLC-model is considered with more than one input, the matrix Bj ob- 
viously has more than one column. All of these columns describe one specific 
input. The approximate model should then also allow more inputs and give 
an accurate approximation for all of these. In fact, the transfer function has 
become a transfer matrix, and we should have accurate approximations for 
all entries of this matrix. 

Multiple inputs implies that the Krylov subspaces are also larger. For 
example, if two inputs are considered, the Krylov subspaces have a dimension 
which is twice as large compared with the subspaces generated for one input: 

/Cn(Bi, A) - . . • , (8) 

2 2 2 

Projecting onto these larger subspaces leads to system matrices which are 
correspondingly larger. Hence, the reduction obtained is less, and we have to 
be very careful with the number of columns generated. To find an appropriate 
space which contains the information needed for several inputs, we propose 
the following algorithm: 

For every input column B^ 

Calculate t = (G -h aC)~^'Bm 

Make this vector orthogonal to the already existing columns 
Do k iterations of the Laguerre algorithm 

every columns is put orthogonal to every other 

end 

end 

Project onto the Krylov-space 
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Because every column is treated independently and every generated column 
is made orthogonal to the others, we expect the Krylov subspace to contain 
less redundant information. 

6 A Disadvantage of Reduction Methods 

The Kirchhoff’s current laws and the branch equations, describing the RLC- 
model, can be derived directly from the state space formulation. Unfortu- 
nately, after reduction this is not possible anymore. For instance, if a state 
space vector x is used, in general it will consist of branch currents and node 
voltages. After projecting this vector onto a Krylov subspace, the rows have 
lost their physical meaning. The input and output terminals are kept, but 
the others may disappear. All well-know reduction methods (AWE [1], PVL 
[2], PRIM A [5]) suffer from this problem. This includes the Laguerre method 
presented in the previous section (s). 

The problem mainly consists of the fact that we cannot make use of a cir- 
cuit simulator in a direct way. Furthermore, starting from a model consisting 
of resistances, inductances and capacitors, it is desirable to have a reduced 
system which also consists of realizable or even passive components. This can 
not be done directly. However, there is a way to solve this problem via the 
Laguerre method, and we shall present this now. 

Reconsider the Laguerre expansion given before: 

cy oo ^ / \ ^ 

H(«) = ^EL^((G + «C)-^(G-aC)) (G+aC)-iB(^j (9) 

n=0 ^ 

The advantage of this formulation is that the s parameter is not part of the 
inversion process for large matrices anymore. The parts of this equation which 
depend on s can be represented by small filters. These filters are shown in 
Figure 1. 

The voltages implied by these filters have to be multiplied by a factor 
L^((G-|-a;C)“^(G— aC))^(G+aC)“^B for every n and then added, in order 
to obtain the weighted summation. The circuit shown in Figure 2(a) stops at 
n terms. But this series does converge very slow, so all elements, or at least 
many elements in the sum must be taken into account. This can be done 
by implementing a loop of filters, as shown in Figure 2(b). This realizable 
circuit can be implemented in a circuit simulator. We used Pstar which is 
the Philips proprietary circuit simulation programme. 

7 Experimental Results 

We applied the proposed algorithm to a PEEC model, and to some PCB 
examples which were modelled as RLC-circuits. The PEEC method gives an 
approximation for the behaviour of interconnect, the method was developed 
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Fig. 1. (a) The filter representing (b) The filter representing 




(a) (b) 

Fig. 2. (a) The filter circuit, (b) The loop circuit 



by Ruehli [6]. The PEEC model we used is a nice example, because the 
graph of the transfer function is rather intricate and hard to approximate. 
The proposed Laguerre algorithm can approximate this example very well. 
In Figure 3(a) the approximation of the Laguerre algorithm with 0 = 5 10^^ 
and ^ 92 is shown. For the given frequency range no difference can be 

observed. Of course we have to be careful with this result. An approximation 
in the frequency domain does not guarantee a good approximation in the 
time domain. Transient analysis should be applied to be sure, that the result 
is accurate. 

The other example is not chosen for its complexity, but to show that 
it is possible to combine our filter realization with non-linear components. 
We consider a PCB board (see Fig. 3(b)) which, after discretization, can 
be described by system matrices of 460 x 460 entries. This representation 
can be reduced with the proposed method. We used a reduced order model 
consisting of system matrices of size 60 x 60, in order to get approximation 
up to 1 GHz. The loop filter representation of these kind of models were 
implemented in Pstar and combined with each other and other component. 
The preliminary results are fine. Sometimes, (as yet unexplained) strange 
behaviour is observed for lower frequencies. Further, some practical issues 
have to be solved. 
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Fig. 3. (a) The transfer function of the PEEC model: the original and the Laguerre 
approximation, for a = b 10^° and q = 92(b) The PCB used in the second example 



8 Conclusion 

We have shown a modified implementation of the SVD-Laguerre algorithm. 
We are now able to deal with multiple input in an efficient way and we 
orthogonalize during the proces. The algorithm is stable and passive and 
leads to an accurate solution. 
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Abstract. In this paper we show how the existing numerical time- integration 
methods of the Maxwell equations can be handled in the framework of the theory 
of operator splitting methods. We consider the classical Yee-method, the Namiki- 
Zheng-Chen-Zhang alternating direction implicit method (NZCZ) and the Kole- 
Figge-De Raedt method (KFR). The unconditional stability of the NZCZ-method 
has been proven only by means of extensive use of computer algebraic tools. We 
give a pure mathematical proof. We compare the methods from the point of view 
of accuracy and computational speed. 



1 Introduction 



The mathematical model of electromagnetic problems can be written in the 
form of the so-called Maxwell equations (in source-free case) 



-VxH-he9tE = 0, VxE-hMH-0, 
V(eE) = 0, V(/iH) = 0, 



( 1 ) 

( 2 ) 



where E and H are the electric and magnetic field strengths, respectively, e 
is the electric permittivity and fi is the magnetic permeability. We have to 
compute the fields E and H with some given boundary and initial conditions 
and material parameters. Because E and H are supposed to satisfy (2) at 
f = 0, we have to solve only system (1). 

In real-life problems the exact solution of system (1) is very complicated 
or even impossible, this is why numerical methods are generally applied. 
The most frequently used one is the so-called Yee-method, which was intro- 
duced for the Maxwell equations in 1966 ([13]). This is a finite difference 
method, which applies a staggered spatial discretization and a so-called leap- 
frog scheme in time- variable. The Yee-method is computationally equivalent 
with the so-called Finite Integration Technique (FIT, see e.g. [11,12]), when 
this technique is applied to Cartesian grids and combined with the leap-frog 
time-integration scheme. The Yee-method suffers from a very strict stability 
condition, namely, it is stable if and only if the condition 



At < 



CyJil/AxY -h {l/AyY + (1/AzY 



(3) 
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is fulfilled, where c — is the maximal speed of light in the computa- 

tional domain, Z\x, Ay and Az are the grid-sizes and At is the time-step (see 
e.g. [9]). 

Because of the very large system integrity of electronic devices and the 
applied very wide frequency range, a new claim has been established to the 
simulation methods. They have to be able to model, among others, the cou- 
pling effects between interconnect systems and electronic circuits (such as 
cross-talk), and the effects inside the wires (e.g. skin-effect), etc. To simulate 
the skin-effect we must choose very small grid-size. This implies, according 
to the upper bound (3), that we have to choose very small time-step too, 
which increases the computational time dramatically. A lot of effort has been 
invested during the last decade to bridge the stability problem of the Yee- 
method. The main goal was the construction of methods, where At can be 
chosen based only on accuracy conditions instead of stability reason. The 
first paper was [15] that showed an unconditionally stable method (called 
NZCZ-method) with a detailed proof of the stability. The method was also 
considered for more general problems in [8]. It is also given an uncondition- 
ally stable numerical scheme in [5] (KFR-method). In [4] we showed that the 
above methods have common features and they all are based on matrix ex- 
ponential approximation. In this paper we investigate the applicability of the 
operator splitting theory for the numerical solution of the Maxwell equations. 
This allows us the comparison and to find the way to the construction of new 
simulation methods in the future. The stability of the NZCZ-method was 
proven with the help of significant use of computer algebra (namely software 
MAPLE V). We give the proof of unconditional stability for nonhomogeneous 
media with pure mathematical tools. 

The operator splitting method is very efficient in solving initial and bound- 
ary value problems for differential equations. The basic idea of the method is 
the splitting of the original problem into sub-problems according to the phys- 
ical processes involved. Then these sub-problems, which can be handled more 
easily, are solved sequentially using some appropriate methods. The solution 
of the original problem is approximated by the solutions of the sub-problems. 

We demonstrate the method on the system of ordinary differential equa- 
tions (ODEs) 



^'{t) = A^{t)j t G (0,Tj, 1^(0) is given, (4) 

where A G and ^ : JR IR"^. It is known that the solution of (4) 

can be written in the form ^{t) = exp(tA)lZ^(0), where the exponential of the 
matrix tA is defined by the series of the exponential function. This shows that 
to compute the exact solution of (4) we have to determine the exponential 
exp(tA) exactly. This is generally very difficult because of the infinite series 
of the exponential function. Splitting the matrix into the form A = Ai -f A 2 
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and choosing a time-step r we can define the sequence of systems of ODEs 
(so-called sequential or S-splitting) 

(0 = = (5) 

(t) = t € itk-i,tk], (6) 

where A: = 1, 2, . . . , n (T = nr), tk-\ — {k — l)r and = 1^(0). Solving 

these systems, the solution ^ of (4) can be approximated at time-level tk 
by The error of the splitting methods is defined as the difference 

between the exact and the approximated solution at time-level t\ — r. For 
the S-splitting method we have 

ErrspiXr) = (exp(r(Ai + A 2 )) - exp(rA 2 ) exp(rAi))lZ'(0) = (7) 

^y[Ai,A2]^(0) + (9(r^). 

It is clear from the above expression that the splitting error is zero for all 
initial vector iZ'(O) if and only if the matrices Ai and A 2 commute, that is 
their commutator, defined by [Ai, A 2 ] = A 1 A 2 — A 2 A 1 , is zero. This is the 
ideal case, but the splitting error generally occurs. It can be seen from (7) that 
there is a strong connection between equations (5), (6) and the exponential 
approximation exp(r(Ai + A 2 )) ~ exp(rA 2 ) exp(rAi). In similar manner, 
second order splitting procedure can be defined applying the approximation 

exp(r(Ai -f A 2 )) Sr := exp((r/2)A2) exp(rAi) exp((r/2)A2) (8) 

(so called Strang-splitting). Fourth order splitting can be obtained with the 
approximation (see [14]) 

exp(r(Ai + A 2 )) « S^rS(i- 20 )rS^r, = (2 - (9) 

The splitting method can be extended applying more than two matrices in 
the splitting, for partial differential equations or for non-linear problems (see 
e.g. [3,6,7,10]). 

2 Splitting of the Semi-Discretized Maxwell Equations 

In this section the Yee-, NZCZ- and KFR-methods are analyzed and com- 
pared in the framework of the operator splitting theory. The comparison is 
done in the computational speed keeping the accuracy acceptable. 

With the usual staggered semi-discretization of the Maxwell equations, 
dividing the computational domain into N so-called Yee-cells, we arrive at 
a system of first order linear ordinary differential equations in the form (4). 
The matrix A G is a sparse (at most four elements per row) skew- 

symmetric matrix and the function iZ' : IR — > IR®^, t ^{t) gives the field 
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components (multiplied by the square roots of the material parameters) at the 
point of time t (see e.g. [4,5]). In the solution of (4), an additional difficulty 
is the very large size of A. Let us investigate how the operator splitting 
method can be applied here in order to split the original system into easier 
(or exactly) solvable ones. 

2.1 Splitting in the Yee-Scheme 

We split the matrix A into the form A = Aiy + A 2 y, where the splitting 
is done according to the magnetic and electric fields, respectively. Aiy is 
composed from the matrix A changing the rows belonging to the electric 
field variables to zero rows, and A 2 y can be derived, in similar manner, 
zeroing the rows belonging to the magnetic field variables (see [4]). From the 
operator splitting point of view the Yee-method is based on this splitting. 
The exponential of the matrices Aiy and A 2 y can be computed easily. 

Lemma 1. We have the relations exp(Aiy) = I + Aiy and exp(A 2 y) = 
I + A 2 Y, where I is the identity matrix. 

Proof. It is enough to notice that, because of their structure, the powers of 
the matrices are equal to the zero matrix. Thus their exponential can be 
computed with the first two terms of the series of the exponential function. 

□ 

Applying the S-splitting method, the sub-systems (5) and (6) can be solved 
exactly because of Lemma 1. Thus the approximate solution can be deter- 
mined by the iteration scheme 

_ exp(rA 2 y) exp(rAiy)^^ = (I + tA2y)(1 + tAiy)^^, (10) 

where = ^(0) is given and stands for the approximation of ^ at time- 
level kr. The time-integration error comes purely from the operator splitting 
error. The method is reasonably fast in the calculation of one iteration step, 
because the matrix exponential is approximated in explicit manner in (10). 
The drawback of the method is its conditional stability, r has to be very 
small. So the Yee-method is not efficient in calculations on a relatively long 
time-interval, which can not be improved by the application of the Strang or 
the fourth order splitting either. 



2.2 Splitting in the KFR-Scheme 

In the KFR-method the matrix A is split into the sum of skew-symmetric 
matrices, for which additionally the matrix exponential can be computed 
exactly using the identity 



0 a 




cos a sin a 


—a 0 




— sin a cos a 



exp 



( 11 ) 
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where a is an arbitrary constant. In order to apply the above equality, we 
have to split A into two, four and six parts, in one, two and three dimension 
(see [5]), respectively. Thus we have the splitting A = Aii^ + ••• + ApK 
(p = 2,4 or 6). Similar to the method discussed in the previous section we 
can apply the S-, the Strang- or the fourth order splitting for system (4). 
So we obtain systems of ordinary differential equations with the coefficient 
matrices Aix, • • • , A^^. Because the exponentials of these matrices can be 
computed with sinus and cosinus functions, we need not solve the sub-systems 
numerically. This was the case in the Yee-method too. Thus, for example, for 
the S-splitting we have the iteration 

= exp(rApK) exp{r Ai k)'P ■ (12) 

Considering the fact that the exponential of a skew-symmetric matrix always 
has unit 2-norm (namely it is orthogonal), the KFR-method is uncondition- 
ally stable by construction. Because of the exact computation of the expo- 
nentials, the time-integration error comes only from the operator splitting. 

2.3 Splitting in the NZCZ-Scheme 

The first terms in system (1) can be written in the form of the difference 
of two spatial derivatives. According to these two derivatives we can split 
the matrix A into the form A = Aitv + A2at, where Ain and A2N are 
skew-symmetric matrices (see [4,15]). This splitting is applied in the NZCZ- 
method. Applying the Strang-splitting with the matrices Ai^r, A 2 N we have 
to solve three systems of ordinary differential equations (see expression (8)) 
with the coefficient matrices {t/2)A2n, tAin and {t/2)A2n, respectively. 
These systems cannot be solved exactly so we must apply numerical methods. 
To do this we split the second system employing {t/ 2 )Ain two times. Solving 
the systems by the explicit, implicit, explicit and implicit Euler-methods, 
respectively, we obtain the iteration = U{AtA)^^, where 

U{AtA) = (13) 

= {l-{At/2)A2N)-^ ■ (1+ (Ai/2)Aijv) ■ (I- (Af/2)Aijv)-' • (1+ (A^/2) Az^v). 

(Here At is the time-step of the numerical methods. This is set to be At = r.) 
In practice, the above iteration can be simplified to the solution of two sys- 
tems of linear equations with symmetric tridiagonal matrices in each itera- 
tion step. In the following, a pure mathematical proof of the stability of the 
NZCZ-method (with non-homogeneous material parameters) will be given. 
We remark that a similar proof appeared in [2] after the submission of this 
paper. 

Theorem 2. Let h = mm{Ax^Ay^Az} and let q = cAt/h he an arbitrary 
fixed positive number. Using staggered spatial discretization and the NZCZ 
time-integration method, the numerical solution of the Maxwell equations is 
unconditionally stable in 2-norm. 
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Proof. The unconditional stability means that for all arbitrary q = cAt/h 
the relation 

II || 2 < K II II 2 (14) 

is true for all A: G IN with a constant K (independent of k). Since 

11 (I + sC)-(I-sC)-M|2=1 (15) 

is valid for any skew-symmetric matrix C and scalar s (see [4]), (13) implies 
the relation 

\\^’^\\m\{U{AtA)f’^^\\l< (16) 

<11 (I - {Atl2)K^N)-^ 111 - II (I + {Atl2)A,j,)) 111 • II 111 . 

Because of the skew-symmetricity of A 2 N, its eigenvalues can be written in 
the form ±iA; (/ = !,..., SN, A; > 0, i = Applying this, we have the 

estimations 



(I - (At/2)A2N)-^ 111= e((I + (Zi</2)A2;v)-'(I - {At/2)A2N)-^) = 

1 



= eiiI-iAt/2yAir,)-^) = 



mindll - (Z\i/2)2(±iA()2|} 



l + (Zii/2)2A^,„-^’ 

II I + {At/2)A2N Ill= £*((I “ {At/2)A2N)(i + {At/2)A2N)) = 

= ^(I - {Atl2fA\j,) = 1 + {Atl2f\l,, < 1 + (^)' = 1 + (18) 



where 



^max — max{Ai,...,A3Ar} 

5 ^min — min{Ai,...,A37v} (19) 

and g{.) denotes the spectral radius. Furthermore, the Gerschgoren-theorem 
and the l/(y^^.,.,.M.,.,.A.) form of the elements of A 2 N are applied to get an 
upper bound for Xmax- In the end we get that || ||2< (1 + q^) || Hi? 

that is the choice K — is satisfactory. □ 



Remark 3. For 3D problems the constant q must be chosen according to the 
inequality q < l/y/S (here h = Ax = Ay = Az) in the classical Yee-method. 
In the NZCZ-method the parameter q can be set arbitrarily, which shows the 
unconditional stability of the method. 
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3 Comparison of the methods 

The NZCZ- and KFR-methods are unconditionally stable methods, which 
allows us arbitrary At choice. Of course, enlarging At^ the splitting error 
and the error of the explicit and implicit Euler methods will increase. The 
required accuracy results in a practical upper bound for At. We compare 
the methods from the point of view of the computational speed, keeping the 
accuracy of the methods acceptable. 

For the sake of brevity we consider the ID Maxwell equations with the 
exact solution 

Ez{t, x) = sm{7Tx) sin{7Tt)^ Hy{t^ x) = — cos{7tx) cos{7rt). (20) 

The numerical results with the Yee-, NZCZ- and KFR-methods are listed in 
Table 1 (x G [0,1], Ax = 1/500). We denote the KFR-method with fourth 
order splitting by KFR4 and the sign ” — ” means that the error of the 
method is not acceptable (larger than 0.1). The error is given in / 2 -norm (at 
time level 0.8) and the CPU-time in seconds, respectively. We can notice that 



At 


0.8 


0.08 


0.008 


0.004 


0.002 


0.001 


o 

b 

o 

o 


0.00005 


0.000005 


Yee 

xlO”® 


- 


- 


- 


- 


2.0510 x 10“^ 
0.36 


2.2865 

0.70 


2.3454 

1.42 


2.3648 

14.14 


2.3648 

148.48 


NZCZ 

xlO^® 


- 


7463.5 

0.04 


78.034 

0.43 


21.285 

0.88 


7.0949 

1.83 


3.5475 

3.67 


2.6606 

7.34 


2.3680 

72.15 


2.3652 

735.51 


KFR4 

xl0“® 


- 




- 


- 


54022 

0.62 


3792.2 

1.22 


244.40 

2.49 


2.3893 
1 25.31 


2.3415 

247.87 



Table 1. Computational results with the exact solutions in (20). 



the KFR-method with fourth order splitting behaves much poorer than the 
Yee-method, although, the Yee-method applies only the S-splitting. While 
we get a very accurate solution with the Yee-method in 0.36 seconds (see 
[9] regarding the magic time-step), the KFR4-method obtains an unusable 
solution in the same computational time. What is the reason of this? We 
can compare easily the Yee-method and the KFRl-method (KFR with S- 
splitting) applying them for the above ID problem. The two methods have 
similar structure. They split the original system (4) into two sub-systems 
which are solved exactly. Let us calculate the norm of the leading term in 
expression (7). We get the results 

2 2 

II y[Aiy, A 2 y]-i^° || 2 = 1.25 X IQ-^, || || 2 = 0.73. (21) 

This difference (almost three order) makes the KFR-method relatively inac- 
curate. The difference can be explained by the presence of the initial vector 
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in the error expression and with the rather unnatural splitting in the KFR- 
method. Although, the KFR-method is unconditionally stable, to keep the 
accuracy acceptable, very small step-size must be chosen. This step-size must 
be much smaller than the maximal step-size of the Yee-method. Thus from 
the computational fastness point of view the KFR-method is not better than 
the Yee-method. 

The NZCZ-method is slower with a factor about five than the Yee-method. 
This drawback can be compensated by the increase of the time-step. So, in the 
long run, the NZCZ-method solves the equations faster than the Yee-method 
([1,4,15]). According to Table 1, the NZCZ-method is the computationally 
most efficient one. 
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Simulating Multi-tone Free-Running 
Oscillators with Optimal Sweep Following 
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Abstract. A new method for the simulation of circuits with widely- varying time 
scales is given. The method makes a splitting of the behaviour of the circuit in a 
fast- varying and a slowly- varying component. The method is attractive because it 
can handle frequency modulated (FM) circuits, unlike existing methods. Numerical 
results are given. 



1 Introduction 

With ever increasing operating frequencies of electric circuits, widely vary- 
ing time scales become increasingly common. The fast frequency // can be 
up to 10^ times the slow frequency A typical example would be a cir- 
cuit which generates a high-frequency carrier wave, which is modulated with 
a low-frequency data signal. In such a situation, the period and/or ampli- 
tude changes of the high-frequency signal changes slowly compared to the 
high-frequency oscillation itself. Traditional simulation techniques have great 
difficulty with such widely separate time scales, since they require an amount 
of computation time 0{ff/fs)> In order to simulate a complete waveform of 
the low-frequency signal, such techniques have to simulate many waveforms 
of the high-frequency signals. 

Several approaches have been suggested to speed up the simulation of 
such a circuit with widely varying time scales. They can be split into two 
groups. 

1. Methods that are based on partitioning a circuit into a fast and a slow 
part, such as those described in [1]. These methods can be very effective 
when a good partitioning can be made, but this is not always possible. 

2. Methods that simulate the circuit as a whole, but attempt to split the 
low-frequency behaviour from the high-frequency behaviour. Examples of 
such methods are envelope-following techniques [4,5], and the Multi-rate 
Partial Differential Equation (MPDE) technique [2,8]. 

In this paper, a new method of type 2 is discussed. The method is named 
Optimal Sweep Following New in this method is that it handles autonomous 
circuits in which Frequency Modulation (FM) takes place. MPDE as de- 
scribed in [8] and Envelope Following only handle Amplitude Modulated 
(AM) circuits. Modifications to MPDE to handle FM modulation have been 
proposed; see [7]. 
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The Sweep Following method splits the behaviour of the system in a 
slowly- varying and a fast- varying component. The splitting is optimal in its 
class. 

The essential new feature of the method is that the frequency of the fast- 
varying signal does not have to be provided a priori; rather, it is determined 
by the method itself. As a result, the method can detect and handle changes 
in the period of the fast- varying signal. This makes it suitable for simulating 
Frequency Modulated circuits. Other methods require the period // to be 
provided a priori, and cannot deal with the situation where // changes over 
time. The fact that our method computes // locally is also an advantage in 
that this information is often desired by the circuit designer. 

This paper is built up as follows. In section 2, we explain the notion 
of a sweep and show how a sweep can efficiently be represented. In section 
3, we outline the Sweep Following method. Finally, in section 4 we present 
a test problem and show that Sweep Following solves it. The test problem 
is an FM-modulated problem; it cannot be handled by MPDE or Envelope 
Following. 



2 Efficient representation of solutions with 
widely- varying time-scales 

Consider the following formulation of a Differential- Algebraic Equation (DAE). 

^q(x) +j(x) = s(f). (1) 

The following definition is of importance throughout this paper. 

Definition 1. A sweep of a set Co C by (1) is the set {^(xo,t) | xq G 
Co^t > 0}, where ^(xq, t) is defined as the solution of (1) with initial condition 
x(0) = xo at time t. 

The solution x of (1) may change at two very different time scales, as is the 
case in Fig. 1. Thus we have a fast oscillation, which itself slowly changes due 
to the low-frequency signal. This gives rise to “telephone cord” solutions as 
shown in Fig. 2. We call these “telephone cord” solutions because they are 
tightly wound in a spiral, just like the cord of a telephone. The key idea of 
the Sweep Following method is not to simulate the “telephone cord” itself, 
but rather the following two entities. 

1. The “tube” that is spanned by the cord. 

2. The speed of the rotation of the cord around the tube. 

It turns out that it is often cheaper to represent tube and speed rather than 
to directly represent the solution (“telephone cord”) itself. 

We now give a rigorous definition of the “tube” that represents the solu- 
tion 2. Suppose we have a set Co of initial values to (1), parametrised by a 
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Fig. 1. Transient solution of a modulated LC-oscillator 



parameter s G [ 0 , 27 t], i.e. Cq = {xo(5) [0 < 5 < 27 t}. Moreover, we require 
that the parametrisation xq is continuous, differentiable and periodic with pe- 
riod 27r. In addition, the right-hand side s is also a continuous, differentiable 
function of s and t. 

For a given s G [0, 27t], consider the solution x(5,t) to (1) with initial 
condition xq{s), i.e. the solution of the problem 

^q(x(s,i)) + j(x(s,i)) = s(s,f), (2a) 

x(s,0) = xo(s). (2b) 

The function x depends on two variables s and t and parametrises a surface 
consisting of all solutions of (1) that have initial values in Cq- This surface is 
the “tube” discussed earlier. Observe that this tube is exactly the sweep of 
Co by (1). Since our method follows this sweep, rather than a single solution, 
it is called “Sweep Following” . The parametrisation x of the sweep induces a 
coordinate system ( 5 , t) on the sweep. However, this coordinate system may 
be very skew. This implies that information is not efficiently represented in 
this coordinate system. Therefore, we investigate coordinate transformations 



of the form 

u = s-{-a{t), V = t, (3) 

for some differentiable function a. The parametrisation in (u, v) coordinates 
is called Xq,. With this definition we have 

Xa{u,v) = Xa{s + a{t),t) := x{s,t), (4) 

and Sa{u,v) = Sa{s + a{t),t) := s{s,t). (5) 

From (4) and (2), we can derive 

J^q(x„(u,n)) -hj{xaiu,v)) = Sa{u,v), (6a) 

Xa(n,0) = xo(s). (6b) 
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curve 0 ■ 
curve 1 • 
curve 2 ■ 
curves 




Fig. 2. Solutions of the modulated LC-oscillator in (x;, gf, t)-space for various initial 
conditions. 



By using (3), we can rewrite (6) into 

d d 

a' {v)—q{xa{u,v)) + — q(x„(M,t;)) +j(xa(u,?;)) = s„(u,u), (7a) 

x„(u, 0) = xo(w - a(0)). (7b) 

The following observations are important to make. 

— The resulting formulation is a partial differential equation (PDE) in u 
and V. Obviously, (7) reduces to an ordinary DAE along lines of constant 
5 ; that is, lines of the form u — a{v) = constant. Lines of this form are 
the characteristics of (7). See also [6]. 

— The formulation is not unique: it depends on a certain choice for a. 

— From the way that (7) has been derived, it follows that it is well posed if 
and only if (2) is well-posed. 

The characteristics are all of the form 

u = a{v) + c, for some c G M. (8) 

This means that characteristics can never intersect each other, and therefore 
no shocks in the solution can occur. 

3 Outline of the method 

The basic idea of the method is now as follows. We start at some initial 
sweep Cq. We select a number of points Xc^(ui,0) on Co, and use (7) as an 
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evolution equation in v to find Xa{ui,Vj) at later points {vj}. Finally, we use 
the relation (4) to find the solution of the original problem (1). 

At first, it may appear that solving (7) is much more expensive than 
solving (1). However, for a specific class of problems and for a suitable choice 
of o, the solution of (7) changes much slower than the solution of (1). In this 
case, solving (7) allows much larger step sizes to be taken than if we solve 
(1). This is the motivating idea behind the method. 

This raises of course the question how a should be selected. We want to 
choose a in such a way that the step-size Av of the time stepping in the v 
direction can be taken as large as possible, without compromising accuracy. 
In essence, we want the factor ^q(xc^(u, v)) from (7) to be small; the smaller 
this factor is, the larger step-sizes Av can be taken. Using (7) we find 

27T 2tt 

/ - f - a'{v)^ci{xaiu,v))\f du. (9) 

0 0 

We want to look for an a'{v) for which (9) becomes minimal. Basic cal- 
culus shows that this happens when 

27T 

f (Sa(u,v) -j(Xa(u,v)), ^q(Xa(u,v))) du 

a'(v) = . ( 10 ) 

/ll£q(Xa(u,u))||2du 

0 

From (7a), we find that 

Sa(u,v) -j(Xa(u,v)) = o' (v) — q(Xa(u, v)) + —q(Xa(u,v)). ( 11 ) 

With that, we find that 



27T 



a'(v) = ^ 



f (o:'(v)£<i(^a(u,v)) + £q(xa(u,v)), £q(xa(u,v))) du 



27T 






27T 



/ (^q(xa(tt,v)),|jq(x„(u,u))) du 

= a'M + • 

/l|^q(Xa(M,w))|PdM 



So we see that for this choice of a', we obtain 



27T 

J (^-^q{xc,{u,v)), -^q{xc,{u,v))^ du = 0. 



( 12 ) 



(13) 
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Le. there is an orthogonality between and Equation (10) gives us a 
differential equation for a. The system formed by (7) and (10) is a complete 
evolution equation for Xa and a. 

In order to compute a solution of the system (7), (10), we have to discretise 
the system. We treat u as a space parameter and v as the time parameter. 
Then we use the method of lines, i.e. we first discretise in the space direction 
u. The resulting semi-discrete system can then be solved in the v direction 
by a choice of ODE methods. 

For the discretisation in the u direction, a number of points {ui,...,UAr}is 
selected. The operator ^ can be discretised by a simple symmetric difference 
scheme. The semi-discretised system now becomes 



—q{xa{ui,v)) = Sa{ui,v) - j{^a{ui,v)) - a' (v)di{v) for 



a'{v) 









(14a) 

(14b) 



where the are defined as 

J \ q(Xa(Wi+l,v)) - q(Xa(Ui-l,w)) 



(15) 



Unfortunately, the computation of a'{v) is somewhat expensive, and the com- 
putation of da'{v)/dv is even more expensive. Recall that this choice of a(u) 
was only taken because this leads to a coordinate system that is optimal in 
some sense. However, if this particular choice is too expensive, it makes sense 
to take a slightly sub-optimal but much cheaper alternative. This alternative 
is to keep a' constant between two successive time steps Vn and as 

follows. 



a\v) 



ElliMVn 



^,Sa{Ui,Vn) - j(Xa(«i,Vn))) 






,for u G [un,Un+i). (16) 



4 Test problem 

A simple test problem has been constructed, which captures the essence of 
the problems we want to solve. We take a simple modulated LC oscillator. 
The capacitor is controlled by some external source. 



Cw = Qq, 


(17a) 


1 

II 


(17b) 


0 — ^avg T ^amp COS 


(17c) 
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The choice of parameters in the simulations is L = C = 1, 0avg = 2, 0amp = 1 
and cje = A transient simulation of (17) is shown in Fig. 1. We find that 
if we take 0 to be constant 0avg + 0amp in problem (17), and initial values 
wq = 1 and qo = 0, we obtain a solution where (w^q) describe an ellipse 
around 0 with radii 1 and \/3* This ellipse will be taken as our initial set 
Co* If we take a few points on this ellipse, and compute the solution for each 
of these initial values, we obtain Fig. 2. This figure gives a good idea of the 
sweep that is tracked. Note that the computed solutions all vary very rapidly. 

We now apply the Sweep Following idea, i.e. we compute a'. The com- 
puted curve is shown in Fig. 3. This a' is then used to compute a solution x^, 
which is shown in Fig. 4. Note the smooth, fiat behaviour of as a function 
of V. Since the computed curves vary much slower in Fig. 4 than in Fig. 2, we 
see that it is now possible to take much larger step-sizes in the v direction. 
This in turn leads to a considerable gain in efficiency. 

So far, the method has not yet been tested on realistic circuits. Therefore, 
two questions may arise. 

1. Has the system (7), (10) always a solution? 

2. Will there be a speed-up compared to traditional transient analysis? 

In [3], we show that the answer to the first question is affirmative, provided 
that the underlying DAE (1) has solutions for all initial values in C. The 
second questions remains open. 
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Abstract. The paper presents an efficient numerical method to extract accurate 
R, L, C parameters of passive on-chip structures. This method is based on two 
main original ideas. First, the accuracy is controlled by using two complementary 
approaches based on scalar and vector potential, which provide lower and upper 
bounds for the extracted parameter. The convergence is accelerated by using the 
Richardson extrapolation of the average value of the two complementary bounds. 
Second, the field equations are solved by multigrid finite element method with local 
adaptive mesh subgriding. The refining process is stopped as soon as the desired 
accuracy is reached. 



1 Introduction 

An important step in the integrated circuit design verification is the extraction 
of the equivalent circuit to be simulated with SPICE, starting from the layout 
description. The present design tools are based on very simple electromagnetic field 
models (usually, uniform field is supposed). The importance of parasitic effects in 
high frequency integrated circuit urges the demand of accurate modeling of field 
effects. 

Extraction of R, L, C parameters of on-chip components, p.u.l. values in the 
case of interconnect transmission lines, is apparently a simple static field problem, 
based on the Laplace equation satisfied by the scalar potential. However, in real 
life circuits, millions of different geometric problems should be solved to design an 
integrated circuit. The main difficulty is related to the computational speed and 
then to the solution accuracy. These two aspects are closely related, the decrease of 
the acceptable error requiring a larger CPU time to solve the problem. Therefore, 
the reliable control of the numerical solution accuracy is essential to be carried out 
in order to extract the desired parameters with minimum CPU time, and to avoid 
computation resources wasting. 



2 Complementary formulation of static field problems 

To simplify the presentation only the case of resistance extraction for plane ho- 
mogeneous conducting layers is considered. The problem can be formulated in 
terms of scalar potential V = V(x,y), where E = — gradU, and V is the so- 
lution of Laplace equation: AV = 0, with Dirichlet boundary conditions: V = 
0 on 5 di, V = Vo on 5^2 5 and Neumann boundary conditions: ^ = 0 on 
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where 5 di, Sd 2 ^ire the two terminals having potentials E = 0 and V = Vo respec- 
tively and Sni — dD — {Sdi U 802 )^ 

The problem can also be formulated in terms of the vector potential T = 
kT(x,?/), where J = curlT, and T is solution of Laplace equation AT = 0, with 
Dirichlet boundary conditions: T = 0 on 5^3, T = To on 5d4, and Neumann 
boundary conditions: = 0 on Sn 2 ' The parameter To = I/g^ where I is the 

total current and g is the thickness of the conductive layer. 

The energy functional is in both cases equal to the power losses: 

Wv = a [ igradVfdv; (1) 

Jd 

Wt = P / (curlT)^dv = P [ (gradT)^df, (2) 

J D J D 



where cr 1/p is the electric conductivity. 

According to the variational formulation, the value of the energy functional 
is minimal for the real field, being bigger for any other field distribution [1]. If 
Wv and Wt represent the power losses associated to the numerical solution of 
scalar potential and vector potential problems, the real power loss W satisfies: 
W <Wt, W < Wv.lt follows that: 



Ry 



Rt 



yl<yi 

Wv ~ w 
Wt W 
P -P 



Re, 



Re, 



(3) 

(4) 



SO the complementary bounds are obtained: Rv ^ Re ^ Rt, where Re is the exact 
value of the resistance and Rv , Rt are the values of the resistance extracted from 
scalar potential and vector potential, respectively. 



3 The convergence acceleration 

3.1 Averaging the bounds 

The two complementary bounds provide a reliable method to control the accuracy 
of the numerical result, using the relative error: 






Rt — Rv 

Ra 



(5) 



After computing the two bounds, the returned result is Ra dz 6 r, where Ra = 
{Rv + Rt)/^ is the average value of the two bounds. 

The two potentials will be computed using a numerical method for Laplace equa- 
tions, such as: Finite Difference Method (FDM), Finite Element Method (FEM) or 
Finite Integration Method (FIT), all based on a rectangular mesh, which discretize 
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the computational domain D. The larger is the number of nodes n of the that mesh, 
the lower is the upper bound and the higher is the lower bound, then the exact 
value is localized more and more accurately. 

Therefore, we purpose the following approach that control the accuracy, uses a 
coarse mesh to start and recursively refine until the relative error becomes lower 
than imposed value. This technique is illustrated in fig. 1, where h is the mesh 
norm. 




Fig. 1. Richardson extrapolation 



The first technique we propose to accelerate the convergence of the iterative 
process is based on the use of the average value Ra instead of any of the bounds 
Rt or Rv • 

To evaluate the efficiency of this technique was extracted the resistance of a 
L shaped plate using FEM with rectangular first order elements. At each step k 
the number of nodes (DOFs) Uk increases about four times rik ~ dn^-i. The exact 
resistance Re of a such plate can be computed using conformal transforms [2]. The 
relative errors of the extracted resistance Rv, Rt or their average Ra is: 



_ Rv — Re _ Rt — Re _ Ra — Re 

rCe -the the 



( 6 ) 



are represented versus DOFs number in fig. 2. The acceleration effect of the aver- 
aging is obvious. 

If the relation between e and n in double logarithmic scale (as in fig. 2) is 
approximated by a line: 



\ge = Igeo - Rlgn, 



( 7 ) 



the following relation between error and DOFs number is obtained: 

-R 



e = eon 



( 8 ) 
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DOFs 

Fig. 2. Relative errors versus number of DOFs. 



where so is the ’’initial error” and R is the convergence rate. The values of relative 
errors and their convergence rates for several methods are given in table 1. 

3.2 The Richardson extrapolation 

The second method used to accelerate the convergence is based on Richardson 
extrapolation technique. Based on (8), suppose the following approximation for 
average resistance at iteration k: = Re(l + and at iteration k + 1: 

^(fc+i) _ + because n^+i = 4nfc. Eliminating the following 

Richardson extrapolation for exact resistance Re is obtained: 



4 « - 1 



(9) 



In the FEM case, the relative error of the Richardson extrapolation £oo = {Roo — 
Re) I Re is approximatively 2 times lower than average error (table 1). 



4 Performance evaluation of iterative methods 

The method chosen for solving the linear system of equations obtained from dis- 
cretization method has a very important influence on the algorithm global perfor- 
mances. The linear system obtained from FEM, FDM or FIT is in general of high 
dimensions (10^ - 10® unknowns), but the matrix is symmetric, positive deflned and 
sparse (with maximum 9 elements per line). 

The reference method for solving such systems is the conjugate gradient (CG) 
method [3]. The main idea of this method is that the of the solution linear system 
Ax = b, minimizes the functional /(u) = ^u^Au — u^b. The method is a realiza- 
tion of orthogonal projection on Krylov subspace. The advantage of this method, is 
that on each iteration k the main computational effort consists of the multiplication 
between a vector dk and the matrix A. CG is a semi-iterative method, because on 
an ideal computer, the solution is obtained after a number of iteration n = size{A)^ 
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Table 1. Relative error of extracted results, convergence rates and initial errors 





V Scalar potential ey 


T Vector potential £t 


n (DOF) 


FIT 


FDM 


FEM 


FIT 


FDM 


FEM 


65 


-0.0136122 


-0.0076849 


-0.0111396 


0.0041446 


0.0026230 


0.0087394 


833 


-0.0023611 


-0.0009849 


-0.0016455 


0.0009287 


0.0006715 


0.0014923 


12545 


-0.0003948 


-0.0001446 


-0.0002533 


0.0001602 


0.0001235 


0.0002443 


R 


0.6594671 


0.7074254 


0.6899686 


0.6479846 


0.6672742 


0.6243531 


So 


-0.0308983 


-0.0226166 


-0.0295839 


0.0046511 


0.018581 


0.0201327 




Average Sa 


Richardson 


X) 


65 


-0.0047338 


-0.0025310 


-0.0012001 


-0.0006038 


0.0013325 


0.0005354 


833 


-0.0007162 


-0.0001567 


-0.0000766 


-0.0001790 


0.0000762 


0.0000358 


12545 


-0.0001173 


-0.0000106 


-0.0000045 


-0.0000329 


0.0000034 


0.0000025 


R 


0.6671075 


0.9931547 


1.04516 


0.6245892 


1.146584 


0.9814212 


£o 


-0.0131236 


-0.0103792 


- 0.0047256 


-0.0006038 


0.0013325 


0.0005354 



Table 2. CPU Time and number of iterations (Rv/Rt) 





1 No. Iteration 


1 CPU Time 


[s] 


DOFs 


GS 


DPGC 


ICCG 


GS 


DPGC 


ICCG 


65 


373/53 


29/19 


12/8 


0.00 


0.01 


0.00 


833 


5420/779 


110/76 


36/25 


0.9 


0.06 


0.35 


12545 


68473/10169 


1419/292 


129/90 


274.86 


4.9 


84.71 



if A = and A > 0. In practice, less iterations (10 - 15) are needed, but this 
number of iterations depends on the conditioning number of matrix A. 

To improve the convergence rate of CG preconditioning techniques can be ap- 
plied. Among them, the best known are Diagonal Preconditioning (DPCG) and 
Incomplete Cholesky preconditioning (ICCG). 

Unfortunately, complicated preconditioning schemes, such as ICCG, are not 
efficient in all cases. Even if the iteration number decreases, the total CPU time 
could increase, due to the computational effort of Cholesky factorization, as it can 
be seen in table 2. Therefore, it is much more useful, to measure the efficiency 
of an iterative method not in number of iterations, but in CPU time (preferable 
expressed in M Flops, and not in seconds, since the result can be dependent of the 
computer used). 

Fig. 3 shows the variation of the CPU time with respect to the refinement levels 
of the mesh, which can be approximated by: lg(T) = lg(To)-|-RTlg(n), or equivalent: 






( 10 ) 
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Fig. 3. Times for Iterative Methods 



relation that gives the complexity order Rt of the studied iterative methods. 

Prom the user’s point of view, the relation between the computational time T 
and the relative error obtained, is more important than the relations between e and 
n (8) or T and n (10). 




CPU Time 



Fig. 4. Performance of iterative methods 



Eliminating n between (8) and (10) is obtained following relation (represented 
in fig. 4): 



a = aoT^ , 



( 11 ) 



where a = l/e is define as the ’’solution accuracy”, P = Rt/R is the ’’performance 
rate” and ao = l/(eoT(f) is ’’initial accuracy”. 
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Table 3. Performance parameters 



Method 


^0 


R 


To 


Rt 


Q;o 


P 


FIT + GS 


0.0131236 


0.666 


6.360E-09 


2.2518 


20214 


0.29 


FIT + DPCG 


0.0131252 


0.660 


1.176E-07 


1.7835 


27915 


0.37 


FIT + ICCG 


0.0131234 


0.6711 


1.238E-07 


2.0826 


12807 


0.32 


FEM + FVM 


0.0047334 


0.8063 


3.927E-10 


2.3663 


1986 


0.34 


FEM + SMG 


0.0047334 


0.8839 


2.104E-10 


2.3663 


4655 


0.37 



The performance of a method is characterized by the values of ao and P. When 
comparing two iterative methods from the performance point of view, the most 
efficient will be the one with greater values for both parameters. If this is not the 
case, say method A has qq greater than method B but P less than method B, then 
method A is appropriate for carrying out fast but approximative computations 
while method B is appropriate for high accuracy computations. 

In table 3 is presented the performance parameters values for different solving 
techniques: GS = Gauss Siedel, DPCG = Diagonal Preconditioning Conjugate 
Gradient, ICCG =Incomplete Cholesky Conjugate Gradient, FVM = Full V Cycle 
Multigrid and SMG = Simple Multigrid [4]. 



5 Adaptive Multigrid Method 

The algorithms described above were based on a hierarchy of uniform grids having 
increasing number of nodes. Their purpose was to allow a more exact representation 
of the field in critical areas, where their spatial distribution is highly nonuniform. 
However, a fine uniform grid over the whole domain is useless, since there are 
subdomains where the field is smooth and for its representation a coarser grid 
could be enough. 

In order to improve even more the performance of the extraction algorithms, we 
propose the use of multigrid method in an original form. The process starts from an 
initial coarse grid, recursive refined only in critical zones, where the field has high 
nonuniformities. To find the subdomains where the subgridding should be used, the 
error indicator of the linear interpolation along x and y directions, is used: 

^ _ \Sx\P \Sy\ _ \Vi-ij — 2Vij + Vi+l,j \ + IViJ-l — 

Sii - 2 “ 16 • ^ ’ 

The refining will be carried out where this error indicator has relatively high values. 

Fig. 5 and 6 shows the variation of this error in the case of the L shape domain, 
for two different grids. It can be noticed that for the fine grid the error has a sharp 
maximum near the inner corner and much lower values on the rest of the domain. 
This conduces to a subgridding domain of small size placed in the neighborhood of 
the inner corner. 
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Fig. 5. The error in 65 DOFs 



Fig. 6. The error in 833 DOFs 




Fig. 7. Adaptive multigrid errors 



The variation of the error with respect to the adaptive refining level for initial 
grids with different DOFs is shown in Fig. 7. 

By applying the proposed adaptive local subgridding, the performance param- 
eters of the multigrid FEM becomes ao = 27158, P =■ 0.17. 



6 Conclusions 

Complementarity is a powerful principle, which allows the extraction of parameters 
of passive on-chip components with accuracy control. Thus, the waste of computa- 
tional resources is avoided and fast extraction algorithms are obtained. 

The proposed techniques based on averaging the complementary bounds fol- 
lowed by Richardson extrapolation proved to be extremely efficient in conjunction 
with the finite element method. 

In the case of the studied test structure, the acceleration yielded a decrease of 
the error by two order of magnitudes (100 times). To obtain the same error without 
using acceleration, grids with 100 times bigger number of nodes should be used, 
leading to an important increase in the CPU time. 

An even more decrease of the computing time can be obtained by using ap- 
propriate solvers for the linear sparse system of equations. In this respect, the 
best results were obtained with the simple-descent multigrid method, where the 
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smoothing is carried out by means of the conjugate gradient method with diagonal 
preconditioning. The adaptive multigrid method, based on local subgridding is a 
very promising one, with many possible developments. 
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Abstract. We are investigating inductive coupling optimization schemes and quan- 
tization effects for microscopic metal rings as a possible basis for a quantum bit 
(qbit). Faraday induction is proposed to provide electromagnetic coupling between 
the rings, therefore acting as an information carrier. Quantizing this information 
will produce distinguishable ring states that can be denoted by |0) and |1), repre- 
senting the logic states of the qbit. We have set up simulation case studies with 
the aim of reducing signal loss between the rings. Further, different quantization 
mechanisms are investigated analytically. A combination of the two concepts can 
in theory be used to design qbits, consisting of metal rings with I/O facilities. 



1 Introduction 



Quantum computing [1,2] has recently become an active field of research with in- 
creasing interest, both in the theoretical and practical aspects. Quantum mechanics 
offers the phenomenon of superposition of states, enabling parallel computing. Un- 
like in classical computing, where a single bit is always in either one of the two logic 
states |0) and |1), linked to physical, measurable states, in quantum mechanics it 
is possible to bring a bit in a state that is an arbitrary linear combination of these 
basic states. Therefore, information space itself is larger as it extends to the full 
Hilbert space spanned by the basic states. During computation, a two-state system 
with orthogonal basis states |0) and |1) can then generally be found in a global 
state \q) as any combination of |0) and |1), 



|9) = a|0)+/?ll) , \af + \(3f = l, 



( 1 ) 



where the complex numbers a and [3 are representing the probability amplitudes 
of finding the system in states |0) and |1) respectively, while the probabilities |ap 
and \(3\^ add up to one. 

The power of quantum computing lies in the fact that the different qbits can be 
coherently connected to each other during computation until the result is read out. 
This coherence is provided by an entangled state of superposition which the qbit 
register can take. For a register of n qbits, a global state \Q) can be described by 
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linearly combining the 2'^ qbits basis vectors, 

1 1 1 

|Q) = EE---E a(ii,Z2, . . .in) In) (8) In) C) . . . O |in) . (2) 

z'l =0^2=0 in=0 

In general, any qbit has to satisfy the following requirements [3] 

— Good description of the devices and circuits that represent the qbit. 

— High isolation from the environment to insure that coherence times are longer 
than computation times. 

— A universal, reproducible initial state. 

— The possibility of accessing the qbit to perform a controlled sequence of unitary 
transformations in order to realize quantum algorithms. 

— The ability to link up measurements with quantum probabilities. 

What’s the gain with respect to classical computing? One should focus on problems 
for which classical computers would either not find a solution or require extremely 
long computation times and then find appropriate quantum algorithms for solving 
them. By implementing the parallel computing phenomenon provided by quantum 
mechanics, great computational speed-ups can be gained. Shor’s famous quantum 
algorithm [4] for factoring large numbers has recently been realized experimentally 
using NMR techniques [5]. 

Factoring an arbitrarily large integer n can only be done by using a quantum 
parallelism. 

f(x)n,a = a"" mod n . (3) 



The Shor algorithm tries to find the period r of the function /(x), where n is the 
number to be factored and a < n being a coprime to n, i.e. it has no common factors 
to n. Number theory allows us to rewrite the algorithm as + 1) = 

0 mod n provided that r is even. Any r that kills the product, thus producing an 
integer multiple of n for an arbitrary value of a is a solution and can be used to 
extract factors of n; 

gcd(a’’^^ — 1, X gcd(a^^^ -h 1, n) = n , (4) 

where “gcd” denotes the greatest common divisor. 

Calculating this function for a large number of a’s would take an exponentially long 
time on a classical computer but can be done in one step (polynomial time) on a 
quantum computer, by calculating all values in superposition. 



2 Microscopic metal rings as qbit basis 

The hydrogen atom has been studied intensively over the past century and can be 
described very well quantum mechanically, thus seemingly providing a qbit appli- 
cation. However, the smallness of the atomic scale and the solid-state nature of 
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computer hardware prohibit us from doing so. If we desire a stable qbit register 
inside a computer, including the necessary in- and output facilities, we will have to 
construct artificial atoms. One interesting candidate for such a device, are micro- 
scopic metal rings, communicating via electromagnetic induction. In- and output 
ports can exist in the form of ring-like structures connected to the outside world, 
accessible for applying or reading out alternating current signals. A ring, carrying 
such a current produces a time- varying magnetic field which can induce a current 
signal in another ring. Thus, a three-ring arrangement, consisting of an input ring, 
a free ring and an output ring could be used in principle for transferring a signal 
as well as using it for a logic operation. 

The free ring or computing element will be our operating system. Classically, 
we can distinguish between a ring containing an induced current or not where the 
amplitude of the induced signal changes linearly with respect to the driving signal. 
At this stage, it is important to set the classical basis for information transfer. The 
magnetic field lines produced by the input ring will spread out in space and only 
a limited amount (~ 20%) will go through the free ring. Thus, ways of guiding or 
focusing the magnetic flux have to be found. We have investigated three approaches 
(see section 3): 

— sandwiching the ring system between superconducting plates, in order to ex- 
plore the Meissner effect to bundle the flux, 

— introducing ferromagnetic cores to guide the flux similar to a transformer, 

— putting the free ring in a higher plane where the flux coupling is stronger 
compared to the radial direction. 

After optimizing the classical behavior, different quantization mechanisms have to 
be investigated. There are two ways of achieving information quantization, one, by 
using nano-sized rings with discrete energy-eigenstates, the other, by using super- 
conducting rings carrying persistent currents. 



3 Simulation results 

Electromagnetic induction is governed by Faraday’s law : T4mf = — d#/dt. Induced 
currents depend on input frequency and system impedance. Thus, high flux coupling 
is necessary to ensure the desired physical effects. 



3.1 Superconducting plates 

The magnetic effect of sandwiching an input ring and a free ring between two 
superconducting plates was studied first, neglecting the influence of the free ring. 
After reaching the superconducting state, all magnetic field lines will be expelled 
from the plates due to the Meissner effect. Consequently, the magnetic field lines 
get squeezed and are confined to the region bounded by the plates. A resulting 
boost in the induced signal of the free ring is expected. The rings can be described 
as small tori of inner radius a, outer radius b and height c and are separated by 
a minimal distance R. The circulating current density inside such a torus reads in 
cylindrical coordinates (p, </), z) 






for a < p < 5 , | 2 :| < I , 
elsewhere. 



( 5 ) 
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While the current density is confined to the torus area, the magnetic field lines are 
squeezed within a region between the two plates, separated by a distance d (larger 
than the penetration depth of the superconductor) due to the Meissner effect, which 
is reflected in the boundary condition 

B,{p,±^) = 0 ( 6 ) 

The latter suggests a convenient Fourier expansion for Bz (p, z) : 

oo , 

Bz{p,z) = Bn{p) sin kn{z -) , /cn = ^ , n = 1, 2, 3, . . . (7) 



and similarly for J((p, z). Substituting the Fourier series into the Maxwell equations 
VxB = /ioJ and V*B = 0 , we obtain a second order differential equation for 
each Fourier component. 



The second ring is located in the same plane {z = 0) as the input ring. Taking 
the current density equal to a constant value Jo within the input ring, we may 
solve Eq. (8) in terms of the modified Bessel functions lo and Kq and obtain an 
expression for the magnetic field component Bz (p, 0) for studying the effect of the 
input ring on the second ring : 



Bz{p,0) = - 



4po Jo J 



E : 

n(odd) 



k r 

■ sin{^^) Ko{knp) / thdt . 

^ Jkna 



(9) 



The magnetic flux induced in the area S of the second ring due to the current 
flowing through the first ring, is then given by : 

^12 = j B-dS = j B.(p, 0 )dV 



8po Jod^ 



E : 

n(odd) 



knC 



, r<t>0 rknP2(4>) fknb 

■/ d(j) I xKo{x)dx Ii(t)dt. (10) 

do dknPl(4>) Jkna 



with (j)Q = sin~^(f)/i^), pi, 2(0) = -Rcos0=p Vo^~^R^sin^0. Numerical simulations 
based on Eq. (10) were carried out using a Fortran 90 program. The following 
parameters were selected : a = 1.5 p.m, b = 2.5 pm, c = 0.2 pm, R = 0.1 pm, and 
a static input current of I = 1 mA. 

There seems to exist a balanced relation between the ring separation and the plate 
separation. Close-by rings between close plates do not necessarily experience a 
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signal boost. For each fixed value of either separation, there exists an optimal value 
for the other. Figures 1(a) to (c) show the magnetic effect (field and fiux) on the 
second ring due to the first ring for a fixed ring separation R and increasing distance 
d between the plates. The slope first increases until a maximum is reached, after 
which it decreases asymptotically to the free field value (no plates). Physically, this 
is an interesting observation as it suggests that although the field lines are indeed 
squeezed as intended, their focusing point shifts. For closely spaced plates, the rings 
have to be far away from each other to get influenced at all. 

Further, the boosted up signal in the second ring is only 20% higher in the optimal 
case (<?opt — 5 • 10~^® Wb) than the free field signal and coupling itself is about 
24%. In a three-ring system, the total coupling (I/O) is ~ 5%. 



B{mT] 



B[mT] 



<D[fWB) 




(a) (b) (c) 

Fig. 1. Magnetic behavior: (a) B-field vs. d for R = 0.1 pm (b) Free field (c) Flux 
vs. d for different R 



3.2 Ferromagnetic cores 

The effect of using ferromagnetic core structures was studied next. In the three-ring 
system, the free ring shares one core with each of the other two rings without gal- 
vanic contact between the cores and/or the rings to avoid electronic coupling. The 
cores use the magnetic field produced by the input ring to switch their permanent 
magnetic moments, provided the input frequency is below ^0.5 GHz, above which 
the switching cannot keep up anymore. Another critical quantity is the saturation 
field of the ferromagnetic material which should not be exceeded in order to en- 
able controlled field reversal. The presence of the ferromagnetic cores confining the 
magnetic field lines to the core region produces a transformer-like action which is 
exploited to achieve efficient electromagnetic coupling between subsequent rings. 
Static 2D simulations were carried out for a symmetrical system, using the param- 
eters of 3.1 [6]. The cores are separated from each other and the ring edges by 0.1 
pm, lie 0.5 pm above the rings and are 0.5 pm thick. A flux coupling of 99.5% seems 
possible between the first two rings with a core error of the order of 0.1 %. The 
total coupling in the system is about 49% and proofs logical as half of the flux in 
the second ring can be transferred to the output from geometrical considerations. 
In the simulations, a general core material (any soft permalloy) with a medium 
permeability pr = 1000 was used (some permalloys have pr — 100000). Magnetic 
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fields of around 350 mT were reached inside the cores (figure 2b), providing a flux 
of around 2 x 10“^^ Wb threading the central ring, with an improvement factor of 
the order of 100-1000 compared to the free-field case (figure 2a). 




(a) (b) 

Fig. 2. Magnetic flux lines : without (a) and with cores (b) 



3.3 Planar arrangements 

A seemingly obvious way of improving the flux coupling could be using a different 
planar arrangement. The in- and output rings are kept in the ground plane, while 
the free ring is placed in a higher plane such that half of its area lies above half of 
the in- and the output ring respectively. The magnetic field lines of the input ring 




Fig. 3. Planar effect 



are very dense in a region perpendicular to the ring plane. Direct coupling between 
the in- and output rings is low due to a rapid decrease of magnetic interaction 
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in the radial direction (oc free field coupling ~ 20%). Using the same input 
parameters as above, and putting the free ring 1 nm above the other two rings, 
static simulations were carried out [7]. A flux coupling of about 70 % is achieved 
between the input ring and the free ring and a total system coupling of about 49 
%, with a corresponding flux of about 7 x 10~^^ Wb through the free ring (see 
Fig. 3). However, this system requires good calibration for error control, providing 
difficulties during experimental realizations. 



4 Quantization effects 



This section states a brief analytical summary of the two different quantization 
concepts. The final equations will be used for future simulations [8]. 



4.1 Discrete states for a nano-ring 

Electrons in a nano-ring of dimensions a = 15 nm, 6 = 50 nm and c = 5 nm are 
described by discrete eigenstates |l^m,n,p) and eigenenergies Em,n,p • Starting from 
the Schrodinger equation and a Hamiltonian that includes a vector potential term, 
the current density in the ring can be derived. For the geometries chosen in this 
paper, the total current is described in good approximation by 



/ = |j-dS = 2 - 



eh 



‘1 Ale'll pQ 



m,n,p 



+ exp(- 



( 11 ) 



m,n,p are the quantum numbers labeling the eigenstates, is the driving flux 
and ^0 = h/e denotes the Dirac flux quantum while S now denotes a cross section 
of the ring. The chemical potential p is determined by the total number of electrons 
residing in the ring. 



4.2 Persistent current in a superconducting ring 

A superconducting ring, independently of its size traps magnetic field lines inside 
its central hole for as long as the temperature stays below the transition point. This 
results in a persistent current flowing around the edges of the ring. Starting from 
the Ginzburg-Landau theory, the current density can be derived by minimizing the 
Gibbs free energy with respect to the order parameter (/>, z) = R{p, and 

the vector potential A, 



ieh 






2e^ _ eR^ 

Me Me 



{hVd -f 2eA) 



( 12 ) 
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where — 2e and 2Mg are the charge and the mass of a Cooper pair. In the bulk part 
of the ring, both the magnetic field and the current density are known to vanish. 
Hence, putting ^^(0) = for integer m, we have 



J,(bulk) = - 



ehR^ 

MeP 



(m + = 0 

^0 



(13) 



reflecting flux quantization with respect to the London flux quantum = h/2e 
while the supercurrent Js manifests itself as a surface current. 



5 Conclusions and acknowledgments 

Our studies indicate that ring structures offer interesting physical properties and 
can be used to transfer information. Quantization of information can be achieved 
by using a superconducting ring material below its transition temperature. This 
seems feasible from a production point of view as there are no requirements on the 
dimensions. The difficult part lies in optimizing the flux coupling between the rings 
and the best solution is to use ferromagnetic cores like in a macroscopic transformer; 
99.5% of the input flux reaches the computing element with low system error. 
Combining these mechanisms results in a system that can theoretically exploit the 
necessary effects needed for a qbit. The advantage of such a qbit over other qbit 
layouts is that it is in solid-state, easily scalable and reproducible, thus making it 
suitable for chip design and silicon processing technology. 

We are very much indebted to F. Henrotte from ESAT, University of Leuven. 
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Abstract. The self-inductance of the operating coil of a magnetizing device is 
calculated using different methods. The winding of the coil under investigation 
basically consists of copper sheets with rectangular concentric inner and outer con- 
tours. These plates form the turns of a Bitter coil. They are stacked together with 
an electric insulation between them and connected in series to form a helix-like 
winding. Analytical formulae for cylindrical coils can only be applied as a coarse 
approximation due to the rectangular cross-section and because of exact geometric 
measures of current paths such as for arrangements of filamentary wires not being 
available. More reliable results are obtained, if first self- and mutual inductances 
of all turns are determined according to Neumann’s formula and the resulting self- 
inductance is determined afterwards with respect to the connection for all turns in 
series. A 3D-FEM analysis is carried out in order to verify the method described 
above and to judge the influence of eddy-current phenomena, i.e. the skin-effect, 
which might become important in the usual transient operation mode. 



1 Introduction 

In industrial magnetizers a battery of capacitors is discharged via an operating 
coil, in the interior of which unmagnetic workpieces are placed. The no-load self- 
inductance of this coil plays a certain role among the data to be known for dimen- 
sioning purposes. 

Figure 1 shows a scheme of the operating Bitter coil under investigation in- 
cluding relevant measures. Ferromagnetic materials are not contained within the 
arrangement under no-load condition. 



2 Coarse Analytical Estimations 

The coarsest estimation for the self-inductance of the coil in Fig. 1 is based on 
assuming a homogeneous magnetic field in the interior and using an average cross- 
section. Thus L becomes: 



L — N Ho ^ 5 ^ — 2 (^ext T <^in) 5 ^ ^ {pext T &in) 

W. H. A. Schilders et al. (eds.), Scientific Computing in Electrical Engineering 
© Springer- Verlag Berlin Heidelberg 2004 
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Inserting the given data into (1) one obtains L = 6.76 mH, which must be considered 
an upper boundary only. Since U ^ a and h ^ b do not hold, the assumed field 
uniformity is not given, flux linkage contributions decrease towards the axial ends 
of the coil and thus the real value for L is less. 

By taking into account the decrease of the flux density along the positive and 
negative z-axis a more accurate result can be calculated. For solenoids the following 
formula can be applied, see e.g. [1]: 



N fio-j— 



1 + 



( — )^ — — 
{j J 1 



( 2 ) 



In order to use (2) for the given geometry the equivalent radius of a solenoid to 
be comparable to the rectangular coil should be chosen with respect to equality of 
average cross-sectional areas, tvR^ = a,'h. With (2) L then becomes 4.99 mH. 



3 Semi- Analytical Approach 



In the semi-analytical approach Neumann’s formula is evaluated for the mutual 
inductance between filamentary rectangular wire loops. For two parallel conductors 
with different lengths h and h symmetrically placed at a distance v the integral 
can be solved analytically giving the contribution AM oi this pair of conductors to 
the resulting mutual inductance M. 

By contrast to [2], where a program for the calculation of mutual inductances 
between arbitrary polygonal current loops is described including expressions for 
all possible positions of straight conductors, (3) suffices in the present situation. 
The program developed and used in [3] for the determination of end zone leakage 
inductances of induction motors also contains a subroutine for the general treatment 
of arbitrary polygonal loops. 

By (3) the magnitude of the contribution AM to M is calculated. Its actual 
sign for the summation depends on the current orientation. Pairs of conductors 
with perpendicular orientation do not contribute to M anyway. For calculating M 
between two turns the turn considered to be the exciting current loop is imagined 
to be concentrated in the rectangular middle line, whereas the flux receiving turn 



y 




Fig. 1. Scheme of the magnetizer coil 
investigated here. Measures: Outer 
height Uext = 204 mm, inner height 
ttin = 140 mm, outer width 6ext = 
384 mm, inner width 6in = 320 mm, 
axial length Zz = 450 mm. Number of 
turns: N ^ 200. 
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is represented by its inner and outer contour. 
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The turns are numbered consecutively, and the mutual inductance between two 
of them, denoted Mt,fc here, only depends on the index difference k. Referring to 
the expression given in (3) the mutual inductances between the excited rectangular 
current loop and the flux receiving contours, M\n,k and Mext,k, can be determined 
as shown in (4). The expressions occurring as arguments for (3) are illustrated in 
Fig. 2. 



1 1 

A7 ext ^ — 2 •EE {—ly AM {xi ± d , Xi , Vij) with xo = a, xi = b, 

i=0 j=0 



^ 2 



■Xl-i±^)^ + { 



L 

N -1 



■kY 



( 4 ) 



The values for Mext,fc and Min,fc are averaged, which is considered an adequate 
approximation for the mutual inductance Mt^k between the two turns under con- 
sideration. 

The analytical calculation of mutual inductances between filamentary rectan- 
gular loops is also described in [4], where besides the axial distance also a lateral 
displacement A of the loops is taken into account. This just equals zero in the 
present arrangement. But unlike here the width b in [4] is assumed to be the same 
for the two loops. 



N-l 



receiving 



- excited 
turn 



~B-d , 

7 

T+d 



jEI 

yv-i 



Fig. 2. Logitudinal section of the coil investi- 
gated. Illustration of turn numbering and rele- 
vant distances and measures for the calculation 
of turn to turn mutual inductances. 
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By formally including an index difference k = 0 in (4) also the self-inductance 
Lt of a single turn can be determined, where the inner inductance is taken into 
account cts an additional contribution. The summation of all inductances results in: 

N-l 

= N (Mt.o + g(a + 6)) +2 • y] (iV - k) Mt,k ■ (5) 

V ' ^ = 1 

Lt 



A suitable algorithm for (3), (4), and (5) can be easily programmed even on 
a pocket calculator. With the data given in Fig. 1 a value of Lres = 4.52 mH is 
obtained. 

The method outlined above can be improved by discretising the conducting 
plates, i.e. the flux receiving turn as well as the current excited one, by a number of 
concentrically arranged partial rectangular filamentary loops, e.g. Np — 200, and 
then calculating the flux-linkage and mutual inductance Mt,fc resp., between them. 
The formula for Mt,fc in this case must be modified to become the double sum 
expression in (6), which is still based on (3). 
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with X standing for a or b and n for i or j 



(6) 



The self-inductance of a single turn cannot be determined by admitting i = 
j during the summation for A; = 0. Instead, this particular case is treated in a 
manner similar to the considerations for Mt,o above. The imaginary inner and outer 
contours of a partial conductor are considered to be located at a distance of d/ (2Np) 
inwards and outwards from that partial conductor. Its self-inductance is therefore 
determined by evaluating (4) for A: = 0 with a{ and bi from (6) substituting d and b 
in (4) and d/Np instead of d. For i ^ j the sum remains the same as in (6). Because 
of this calculation of single turn self-inductances and a relatively fine discretisation 
additional inner inductances of partial conductors appear to be negligible here. 

The result for Lres then becomes 4.80 mH. In this approach current paths are 
considered to be rectangular even in the corners of the plates. Since a uniform 
current distribution on the turn cross-sections is assumed, skin-effect influences 
occurring in ac- and transient operation are not taken into account. 



4 3D-FEM-Computation 

The FEM-program EMAS [7-9] is used for determining the 3D-field distribution. 
AC-computations for three different frequencies are carried out, 10“^ Hz for dc- 
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operation, 100 Hz and 2.5 kHz for higher frequencies. The calculated supply voltage 
for a prescribed exciting current Jeff of 1 kA is a direct measure for the impedance 
of the coil. 

Due to the symmetry of the arrangement the FEM-model may be restricted to 
the first octant of the rectangular coordinate system. The vectors of current density 
are perpendicular to the xz- and the y2:-planes The magnetic field is tangential to 
them and the vector potential on them is therefore constrained to have a normal 
component only. 

Discretising each single turn would probably result in an excessive number of 
nodes and thus huge computational effort. Therefore the windings are modelled as 
a massive conducting block as shown in Fig. 3. An effective conductivity is obtained 
by averaging insulating and conducting cross-sections of the real arrangement. Even 
for ac-considerations this appears to be justified by the one-dimensional skin-effect 
in rectangular slot conductors. There, the slot width ws and the conductor width 
wc are taken into account in the reduced conductor height [5] ^ = ffh, which occurs 
in Field’s formulae, by the factor P also mentioned in [6]: 



P = 



wc » 

— 7T/7/X 

Ws 



( 7 ) 



In (7) the quotient wc/ws can be considered a weighting factor for the conductivity 
7 of the conductive bar: In a slot filled over its full width with a material of this 
reduced conductivity the same reciprocal depth of penetration p would occur. In 
the present arrangement the situation at least for winding plates in the centre part 
of the coil should be comparable due to the mainly axial field transversally oriented 
to the winding plates and thus similar to the field inside a slot conductor. 

With an exemplary thickness of t = 2 mm and N = 200 turns of copper plates 
with '^cu = 56 • 10® S/m on an axial lenght of Iz — 450 mm the conductivity pre- 
scribed in the FEM-model becomes 7mod = 7Cu • tj{lz/N) = 49.78 • 10® S/m. 

In order to prescribe current paths regarding the winding structure, i.e. lying 
parallel to the x^-plane, the conductivity has to be anisotropic with only the ten- 
sorial xx- and yy-components being different from 0. Since field exciting current 
densities only have x- and y-components, the vectorpotential may also be restricted 
to these components, Az = 0. 

The winding characteristic has to be modelled by combining two different con- 
straint techniques. On the one hand for each winding plate in the general ac-case 
the current-density distribution may be nonuniform, but the cross-section of the 
turn is equipotential. Therefore degrees-of-freedom No. 4, i.e. time-integrated elec- 
tric potentials I?', of each layer of nodes perpendicular to the z-axis on the input 
cross-section are equalized with the unknown independent time-integrated electric 
potential ^ of an external collective node assigned to this layer. On the other hand 
the current is the same in every winding plate, but their contributions to the total 
voltage may differ depending mainly on their flux-linkage contributions and par- 
tially in the ac-case on different ohmic voltage drops. Therefore another external 
collective node is inserted into the model with the constraint that its time-integrated 
electric potential ^tot is the sum of the contributions of all winding plates. How- 
ever, as mentioned above single turns do not occur in the FEM-model. But with 
respect to the constant axial winding density the potentials of the layer collective 
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nodes can be used in the constraint equation taking also into account a desirable 
nonuniform discretisation in axial direction. If there are n + 1 layers of nodes on the 
input cross-section in subsequent distances Azi ... Azn with collective potentials 
... ^n-hi, fhe constraint equation for the total collective potential iftot can be 
written as follows: 



^tot — -j- • ( 2 2 Azk-j-l) ^k+l ■( — ^n+l) 



( 8 ) 



Note that this approach is not applicable for a complete electrodynamic analysis 
including transient electric field calculation, which would be desirable for determin- 
ing inter-turn electric fields and judging insulation behaviour. Since electric fields 
are not of interest in this example except inside the conducting region, ^ = 0 is set 
for all nodes outside. The vector potentials of all external collective nodes are set 
to zero. The surrounding infinite space is modelled by a spherically shaped layer of 
open-boundary elements with a Radius of R = 350 mm in the given example. 




Fig. 3. Finite element grid of winding 
block and outer surface, discretisation 
of surrounding air not included. Ex- 
ternal collective nodes of input cross- 
section layers and for total voltage. 
Constraints for electric scalar poten- 
tials of winding block. Total number of 
nodes 33130, total number of elements 
32940. 



For the quasi dc-case with 10“^ Hz an inductance L of 4.76 mH is calculated. 
As expected this value is diminished by the skin-effect at higher frequencies, which 
makes the current concentrate towards the inner surface of the coil and thus reduces 
its flux-linkage: L = 4.34 mH at 100 Hz and 4.11 mH at 2.5 kHz. An increase of the 
ohmic resistance is also observed. 

The current concentration can approximately be taken into account in the semi- 
analytical calculation by modifying the geometric measures with respect to the 
depth of penetration from (7). Replacing d hy 6 = j3~^ and inserting a = 
CLin + ^(/) instead of d as well as 5 = 6in + ^{f) for b in (6) one obtains L = 4.34 mH 
at / = 100 Hz and 4.16 mH at 2.5 kHz. These values deviate less than 2 % from the 
FEM results. The procedure appears to be justified by the low values of S for the 
selected frequencies, 7.13 mm and 1.43 mm, which are much less than the winding 
block thickness d. 

The current density distribution for / = 100 Hz as shown in Fig. 4 clarifies the 
quasi one-dimensional skin-effect close to the centre cross-section of the coil, but 
also indicates a planar distribution of current- density varying in axial direction near 
the coil end, where the magnetic field weakens and spreads out. 
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As an illustration of the field distribution a vector plot of the fiux density B 
for / = 100 Hz in the x 2 :-plane is shown in Fig. 5. Prom the axis to the inner 
surface of the coil a slight increase of the magnitude is observed. Inside the winding 
block B decreases to zero and finally changes its orientation within a distance of 
a few millimeters. This also indicates the skin-effect inside the winding. Significant 
deviations from the main axial direction of the inner coil field only occur in the end 
zone of the coil and inside the winding block. 




Fig. 4. Distribution of effective 
magnitudes of the current density 
in the winding block of the FEM- 
model at a supply frequency of 
f =z 100 Hz. Exciting current Jeff = 
1 kA. Average current density Jav = 
13.89 A/mm^. Axial current density 
N • i/lz not depending on 2 :, simi- 
lar to an equivalent current sheet of 
constant 444 A/mm. 



Fig. 5. Vector plot of magnetic fiux 
density (real part) in the a: 2 :-plane 
of the FEM-model at supply fre- 
quency of / = 100 Hz. Exciting cur- 
rent Jeff = 1 kA. (By contrast to 
2D-problems fieldlines even in sym- 
metry planes cannot be obtained 
for real 3D- arrangements by just 
keeping the normal component of A 
constant. Instead, ( V x A) x dl = 0 
would have to be solved directly.) 



5 Conclusion 

Rectangularly shaped Bitter coils used in magnetizers require adapted methods 
for the calculation of the no-load inductance. By simple analytical formulae only 
estimations of unreliable accuracy are obtained. 

An inductance calculation based on Neumann’s formula is easily programmed, 
but in order to achieve a reliable value for L later confirmed by FEM the winding 
plates have to be represented by a sufficiently high number of partial conductors. 

Concerning the 3D-FEM calculation of coil arrangements a helpful modeling 
technique for special kinds of winding regions is presented. The equipotential con- 
straint for turn cross-sections is combined with a summation of the voltage con- 
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tributions over all turns and the prescription of an unisoptropic conductivity with 
respect to possible current paths. 
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Abstract. In this paper we will discuss the EXOR function synthesized with a 
modulo function. As an implementation of the modulo function we will use the 
single-electron tunneling (SET) electron-box as a basic structure. A SET electron- 
box consists of one SET junction in series with a normal capacitor. This basic 
electron-box structure is extended with a number (equal to the number of inputs) 
of normal capacitors. 

Simulations are carried out with the commonly used simulator in SET-electronics 
SIMON [1], indeed showing the expected outcome. 



1 Introduction 

An electron can tunnel through an insulator if the distance between the conduc- 
tors is small enough. A metal-insulator-metal structure is called a single-electron 
tunneling (SET) junction. Due to this layout a SET junction can be modelled as a 
capacitor, when no tunnel events occur. 

When only a number of junctions and capacitors are connected to one node, this 
node becomes a floating node, which is called an island. On such a floating node 
electrons can be stored or erased by tunneling through one of the connected SET 
junctions, creating an independent island charge. Due to the storage of discrete 
independent charge a periodic behaviour arises (discussed in more details in Sect. 
2 ). 

Because the relevant device dimensions are in the nanometer range, SET junc- 
tions belong to the class of nano-electronic components. Generally, the use of na- 
noelectronics is motivated by several arguments [2,3] such as: basic devices can be 
very small, nanoelectronics has the potential to operate with very low supply power, 
and quantum properties that appear at nanometer scale in principle represent an 
increase in signal-processing power. Periodicity is one example which can increase 
the signal-processing power of a structure. 

Only few people use the periodic behaviour of SET junctions for design pur- 
poses. Examples of the use of this periodicity are 
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— Weight factors of neural nets [4] 

— Fault tolerance by using periodicity as redundancy [5-7] 

Modulo functions are another example of the use of periodic behaviour. Many 
articles have been written about applications of modulo functions, Chinese Remain- 
der Theorem (CRT), and Residue Number Systems (RNS) [8]. In those articles the 
usability of the proposed circuits and architectures depends on the implementation 
of the necessary modulo functions. The advantages of those applications in speed 
and chip area compared to a standard circuit solution, depend on the way the 
modulo functions are implemented. 

With the help of a modulo function it is easy to create an EXOR function. For 
a binary EXOR function with k inputs we can write 



out = 



■ k 

irii 



(mod 2) 



( 1 ) 



In Sect. 2 we explain the origin of the periodicity of SET circuits. In Sect. 3 the 
basic structure which is used in this paper, the electron-box, is described as well as 
its behaviour as a modulo function. With the help of the electron-box structure an 
EXOR function is synthesized (Sect. 4) and simulated (Sect. 5). We end this paper 
with the conclusions (Sect. 6). 



2 Origin of Periodic Behaviour 

A floating node (a so called island) can contain independent island charges Qnode- 
As we have shown in [9] these independent island charge can be modelled by a 
current source, with value i = qnode^{t) (see Fig. la). To keep the node a floating 
node the input voltage uin should be capacitively coupled (shown in Fig. la). The 
voltage source can be replaced by a current source using the source transformation 
principle, as shown in Fig. lb. 





(J)iin(t) 



Fig. 1 . a) An island of which the independent charge qnode is modelled by a current 
source. To the same island the input voltage source Uin is capacitively coupled, b) 
The voltage source uin is changed into a current source Un using source transfor- 
mation (in the Laplace domain), iin{t) = C~^ {sCinUin{s)}. 



We assume here that the independent charge on the node, qnode ^ only consists 
of an integer number of electrons n. This is a correct assumption when a SET 
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junction is connected to the island under investigation. Through a SET junction 
only an integer number of electrons can tunnel. Let n be the number of independent 
electrons, and e the electron charge, then we can write for the independent island 
charge qnode = ne. 

The sum of the two current sources drawn in Fig. lb can become equally large 
for different island charges, when the discrete independent island charge is decreased 
by one electron and the input voltage Uin is changed from situation 2 : to situation 
2 : + 1. This can be written in Laplace as: 



Uin{s)sCin +ne = {s)sCin + (n - l)e 



( 2 ) 



To calculate the necessary change in input voltage we can rewrite this equation 
into: 



AUr 



_ 2 + 1 



(3) 



Hence the periodicity of the transfer function of a structure which contains an input 
voltage source capacitively coupled to an island is equal to 



P = 




(4) 



So the periodicity (expressed in voltages) is always determined by the capacitance 
the input voltage is coupled with. 



3 The Electron-Box 

One of the basic SET structures is the electron-box. The electron-box consists of 
a SET junction with a capacitance Cj and one ’normal’ (non tunneling) capacitor 
Cc- Between these two components an island is created. 

For the output voltage Uout of the electron-box of Fig. 2a the following relation 
holds: 



Uout 



UinCc T Qi 



Ce 



with Ce = Cc + Cj 



(5) 



The charge qi is the total independent charge on the island, which is equal to ne 
(see Fig. 2c). 

The critical voltage uY ^ which is the voltage at least necessary to let an electron 
tunnel [10], is equal to 



( 6 ) 
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(a) 




Fig. 2. a) The basic electron-box structure, b) its periodic voltage transfer function, 
which can be used to implement a ’mod’ function, and c) the independent island 
charge, which can be used to implement a ’div’ function. 



In [11] we have shown that we can use one tunnel transition in the transfer 
function to implement negative logic, like the NAND, NOR, and the inverter. In 
this way we use only a small fraction of the complete transfer function. In stead of 
looking at one transition, we can use the periodic behaviour of the transfer function 
to implement more complex digital logic based on modulo functions, like the EXOR. 

When the input consists of discrete possible amplitudes, the continuous transfer 
function of Fig. 2b changes into a discrete function, which as a consequence of the 
periodic behaviour can be transferred into a modulo function (see Fig. 3). 

The applied discrete input values create several small regions within the total 
transfer function. In those regions the input signal can be expected. We will call 
those regions operating windows. The width of the operating windows in case of 
digital circuitry gives the allowed error in amplitude of the applied signals. 



4 The EXOR Function 

Up to now most of the (digital) circuits made with SET junctions, are a transfor- 
mation from the standard CMOS logic, in which many SET junctions are needed. 
For example Jeong et al. [12] proposed an EXOR function with 16 SET junctions, 
which is a direct transformation from a CMOS EXOR function. There are many 
limitations of this design approach, for example many SET junctions are needed, 
which are still hard to make. 

We propose an EXOR which is based on the periodicity obtained from the 
transfer function of a SET electron-box structure. The EXOR function implemented 
with the electron-box has no gain, which can simply be solved by buffering the 
EXOR before coupling it to other structures. 

To implement an EXOR function, the operating windows should be placed in 
such a way that the output voltage is low when the sum of the inputs is even and 
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the output should be high when the sum is odd. In Fig. 3c the transfer function 
of a multi input electron-box is shown, the number indicates the sum of the inputs 
that are high. 



(a) 




(b) (c) 




Fig. 3. a) Only some input voltages are possible (grey areas), b) The discrete 
transfer functions showing a modulo 3 implementation, c) The transfer function of 
a multi input electron-box with the chosen operating windows to create an EXOR 
function. 



The distance between two operating windows is constant, and should be equal 
to the amplitude of the input signal. The electron box should be biased such that 
it reaches operating window ’0’ without signals at the input. 

We wish to implement an EXOR function with k inputs (see Fig. 4a for k = 2 
inputs). The SET structure of Fig. 4a will have a transfer function as shown in Fig. 
4b. For the output voltage we can write 

CjUx “h CcUc T “^6 -f- 1 '^in,iCin,i ^ ^ 

Uout = T; (7) 



with Ci: == Cj -|- Cc + Cin,i- Let Uo be the output voltage when all the input 

voltages are zero, so (5) can be rewritten into 



Cj 

Uo = 77-^i 

Ce 



^ Cc 



Cs 



( 8 ) 



And for the critical output voltage we can write (see (6)) 



cr cr , /rv\ 

Uout = Uj + Ux (9) 

The voltage drop due to a tunnel event is equal to With the help of these 
equations we can determine all the values of the components. 

We assume that we have an input signal with an amplitude of ImV . The EXOR 
function is modulo 2, so we need two input signals with a high amplitude to go to 
the next period (see Fig. 3c). Using input signals with an amplitude of ImV the 
period should be equal to 2mV . For simplicity we assume that the desired EXOR 
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(a) 



(b) 



Uin,1 






K3H I — 






OH I — 

K.2 

(Hu I 

xj 



in, 2 I 




Fig. 4. a) A two input electron-box with two extra bias sources Ux and Uc to shift 
the transfer function, b) The transfer function of the two input electron-box with 
bias sources. 



has two inputs, of which the input capacitances are equal {Cin,i = Cin ,2 = Cin)^ 
The period is equal to the derived (4), with the value which makes it possible 
to derive the value of the input capacitors: Cin = 80aF. 

If we want to use only positive signals, the whole transfer function of Fig. 4b 
should be lifted, so 

Moue = 77- (10) 



Combining (10) with (9) we find for the voltage source Uxi 



(11) 



Because a starting point uo = 0 is desired, the voltage source Uc (assuming no 
electrons have tunneled yet), can be calculated using (8): 



UcCc — UxC j 



(12) 



Only the amplitude of the output voltage depends on the capacitance of Cj and 
Cc, so we are free to choose those. To get a signal which is as high as possible the 
capacitance values should be chosen as small as possible. 



5 Simulations 

The digital EXOR function described in Sect. 4 is tested with the help of the 
simulator SIMON [1]. The test conditions were: T = OX, the amplitude of the input 
voltage Uin is Uin, max = lmI7, Cin,i = Cin, 2 = Cin — 80aF and Ux = 0.25mV. For 
simplicity we chose for the junction capacitance Cj and the extra bias capacitance 
Cc the same capacitive value as the input capacitors, Cj = Cc — Cin- 
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Because the electron will not tunnel directly when the critical voltage is reached 
but rather when the voltage across the SET junction is a bit higher, we compensate 
for those small errors. This compensation can be done by choosing Uc slightly lower 
than Ux, so Uc = —0.24mV. 

The simulation results as function of time are shown in Fig. 5 and confirm the 
expected output of an EXOR function. 
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Fig. 5. The simulated response of a two input electron-box biased as an EXOR 
function as function of time. 



The rise and fall time of Fig. 5 depend at this moment only on the simulator 
used, and are therefore not discussed here. The simulator used is based on an 
existing theory of tunneling that assumes a zero tunnel time. As long as the tunnel 
time is neglected the transition from high to low and visa versa, are in principle 
very short. 



6 Conclusions 



In this paper we have shown the possibility to use a basic SET structure, the 
electron-box, to implement modulo functions. We showed the binary digital EXOR 
operation as an example of a modulo function. Simulations proved its behaviour. 

More complex and larger circuits can be implemented as well. The periodic 
behaviour of a SET sub-circuit makes the implementation of a modulo function 
more natural than in standard CMOS. 
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Abstract. For the coupling of the magnetic field and the electric circuit equations, 
there are different approaches. In any case, the flux linkage has to be taken into 
account by augmenting the finite element system by additional equations. Recently, 
it has been proposed to eliminate the circuit part by taking the Schur complement, 
which results in symmetric and positive definite matrices. The rank of the circuit’s 
contribution to the Schur complement equals the number of linear independent 
coupling variables. Field-circuit coupling therefore introduces a low rank correction 
into the equations of the field problem. The consequences of this key observation are 
discussed in the paper. If a direct solution of the finite element system is considered, 
the circuit coupling can be treated elegantly by using the Woodbury formula. The 
Woodbury formula gives an explicit expression for the inverse of a matrix with low 
rank correction in terms of the inverse of the original matrix. In the framework 
of a preconditioned conjugate gradient solver it turns out that it is sufficient to 
include the circuit equations into the matrix-by-vector product, while the finite 
element preconditioner can be retained. These considerations will be illustrated by 
numerical results that have been obtained from a simple model problem. 



1 Introduction 

A typical feature of mechatronic systems is the close interaction between various 
physical domains. Even if we restrict ourselves to electromagnetic phenomena, a 
coupling between magnetic field and electric circuit simulation is unavoidable in 
most cases to describe the system’s behaviour correctly. 

Field-circuit coupling can be grouped into three main categories. One approach 
consists of parameter extraction, where the electromagnetic field device is described 
by an equivalent circuit and the system simulation is carried out on the network 
level. Field simulations can be employed to obtain both the topology and the pa- 
rameters of an equivalent circuit. A good account on this topic can be found in 
[ 8 ], 

Another common approach is the so-called direct coupling, where the field and 
the circuit equations are collected in one overall matrix and solved together. This 
can be done either under control of the field or the circuit simulator. Typically, a 
finite element (FE) matrix is augmented by the circuit equations [11], or the FE 
equations are represented in the circuit simulation as a multiport device [12]. In 
contrast, indirect coupling keeps both simulations separated. They communicate 
with each other by coupling matrices. The overall problem has to be solved by 
iteration in this case [1]. 
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In this paper direct coupling inside the field simulator shall be pursued further. 
Typically, a large sparse FE matrix has to be augmented by a small dense circuit 
matrix and some coupling matrices. The main goal is to work towards a symmetric 
positive definite (s.p.d.) overall description, while keeping the sparse FE matrix 
untouched. 

The paper is organized as follows. In Sect. 2, the subject is exposed by means 
of an eddy current problem driven by a voltage source, which will serve as a model 
problem. In Sect. 3, it is shown how the Woodbury formula can be used in connec- 
tion with a direct solver to deal with the field-circuit coupling. Next, the field-circuit 
coupling is studied in the framework of a preconditioned conjugate gradient (PCG) 
method. Some conclusions are drawn in Sect. 4. 



2 Setting of the Problem 



The fundamental equation of the eddy current problem is 



curl V curl A H- cr == jg , 



( 1 ) 



where v is the reluctivity, A the magnetic vector potential, cr the conductivity 
and jg an impressed current density. In general, v depends on the fiux density 
B = curl A due to iron saturation. We restrict ourselves to magnetization curves 
with dv/dB > 0. 

The simplest case of field-circuit coupling appears if a single coil driven by a 
voltage source is taken into account. It is assumed that the coil consists of stranded 
conductors, i.e. the wires of the coil are so thin that skin effect can be disregarded. 
However, this is sufficient to show all the principles to be discussed in this paper. 
A simple 3D model problem which falls into this category has been proposed in [7] 
and is shown in Fig. 1. A circular coil is located between two square aluminium 
plates. A step voltage is applied to the coil and the evolution of the current in the 
coil is considered. 




Fig. 1. The model problem: Circular coil between two aluminium plates. On the 
right, the mesh of one eighth of the problem is displayed, which was used for the 
numerical analysis. 
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The equation of the electric circuit is R - i ij)' = U 12 , where R is the coil 
resistance, i the exciting current, the time derivative of the coil’s flux linkage 
and U 12 its terminal voltage. The actual distribution of the wires in the coil, the 
air between them and their insulation shall not be taken into account here. Usually 
it is sufficient to assume that the current is distributed homogeneously over the 
cross section of the coil. The geometry of the winding is described by introducing 
the winding density r = js/b divr = 0. Consider one filamentary conductor with 
closed contour C and cross section zAa, which gives a contribution 

Arp = j A- dl (2) 

c 



to the coil’s flux linkage. The vectorial line element goes into the direction of the 
current density and can therefore be expressed as dl = (js/|Jsl)^^ ~ rdlAa. The 
total flux linkage can be obtained by summing up the contributions from the indi- 
vidual series-connected filamentary conductors, 'ip = '^ Axp. In the limit of infinitely 
thin filaments Aa -> da, Aip d^^, and the sum can be expressed by an integral 
over the cross section ucoii of the coil according to 



Ip : 



h’ If 



A • T d/ da = / A • T di?. 

j 

acoil ^ 



I‘ 



(3) 



The solution of the coupled field-circuit-problem is thus given by solving simulta- 
neously 



curl 1 / curl A -h — 

at 



I 



dA 

dt 



rdi? + R • 



i • r = 0, 
= U12. 



(4) 



After application of the FE method, time discretization by the implicit Euler 
method, and linearization by the Newton-Raphson method, the discrete counter- 
part of (4) can be written in the following matrix form [10] 

Matrix [L] is the sparse symmetric Jacobian matrix resulting from the FE dis- 
cretization, {U} is the field-circuit coupling vector and {<5A} and Si are the non- 
linear increments of the vector potential and the coil current, respectively. If there 
are non-conducting subdomains, the kernel of the double curl operator might give 
rise to a singular matrix [L], depending on the numerical discretization scheme. In 
this case, the line of reasoning of the paper can be applied to the quotient space. 
For the sake of simplicity, we assume a > 0 throughout, even in the coil, resulting 
in [L] being s.p.d. The complete system (5) is sparse and symmetric, but in any 
case not positive definite, due to the negative diagonal entry —AtR. 
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In [10] it was proposed to remedy this problem by eliminating di from the first 
equation of (5). The same algorithm has been adopted in [2]. This corresponds to 
the solution of the system 



-{U} -AtRj \ Si J \ es J’ 


( 6 ) 


m = [L] +^{u}{uf, \ 

{Fs} = {Ji} - ^{U}es. j 


( 7 ) 



The problem boils down to the solution of the first equation of (6) with the s.p.d. 
Schur complement matrix [L], and subsequent determination of Si by means of 
the second one. Be aware that the coil yields a contribuUon with all its degrees of 
freedom (DOF) coupled to each other. This means that [L] loses sparsity compared 
to [L] . This effect depends on the ratio between the number of coil DOF and the 
total number of DOF. If the considered problem happens to be a coil only, [L] will 
be even a dense matrix. Note also that in the asymptotic case h 0 {h being the 
mesh parameter) the memory requirement for [L] will be totally dominated by the 
coil’s contribution. 

This obvious disadvantage can be easily circumvented by observing that the 
coil introduces a rank-1 update to the original matrix [L]. In the general case 
of p independent coils we would obtain a similar rank-p update. This is the key 
observation for the construction of efficient solution schemes. 



3 Solution of the Schur Complement System 

In this section we consider the slightly more general problem 



with [L] G [U] G IR’^^^ and [R] G IR^^^. The time step At has been absorbed 

in the matrix [R]. The number of DOF is denoted by n and the number of coupling 
variables by p <C n. Now the Schur complement system reads 



1 

1 

II 


(9) 


[L] = [L] +[U][Rr[uf, \ 


(10) 


The solution of the system 




[L]{SA} = {Fs} 


(11) 



in the context of time dependent, non-linear eddy current problems will be con- 
sidered, this is the first equation of (9). Once this system has been solved, the 
remaining unknowns can be gained from the second equation of (9). 
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In the general case of field-circuit coupling the situation becomes more involved. 
Besides stranded conductors there are also solid conductors which are subject to 
eddy currents [11]. Moreover, a general circuit contains voltage and current sources, 
resistors, capacitors and inductors joined together in a complicated way. However, 
circuit analysis can be tailored such that the matrix structure (8) is conserved. 
Once again, a s.p.d. Schur complement system (11) will be received by elimination 
of the circuit part. For details of the general case the reader is referred to [3,4]. 



3.1 Direct Solution by Cholesky Decomposition 

Under certain circumstances a direct solution of (11) by Cholesky decomposition 
can be useful. For example, in the code described in [6] a coupling of the FE method 
with the boundary element (BE) method based on domain decomposition (DD) 
is employed. A solution of the FE problem with Dirichlet boundary condition is 
followed by a solution of the BE problem with Neumann boundary condition. This 
iteration is accelerated by a superimposed Krylov subspace method. During the 
course of the iteration, the FE problem must be solved over and over again. Since 
the ressource requirements for the BE-FE coupling are usually totally dominated 
by the BE, it pays off to factorize the FE matrix in (11) to facilitate the repeated 
solution. 

The basic idea is to solve the problem first without taking into account the 
reaction of the field on the circuit part. The ’’wrong” solution will be corrected 
afterwards by including the effect of the rank-p update which can be done at low 
computational costs. This approach is known as Sherman- Morris on formula in the 
case p = 1 and as Woodbury formula in the case p > 1 [9]. The Woodbury formula 
allows to express [L]~^ in terms of [L]~^ as follows 

[ir = [Lr~[U][W]-^[uf, ( 12 ) 

[U] = [U] e [W] = [R] + [U] e (13) 

If [L] and [R] are s.p.d., then [L] and [W] are also s.p.d. In the sequel we shall 
restrict ourselves to that case. Note that the matrix [U]^[L]”^[[7] can be regarded 
as the multiport device representation of the FE equations [12]. 

The field-circuit coupled problem can be solved by the following algorithm: 

— Set up [U]. This matrix can be computed once and for all, because it only 
depends on the geometry of the conductors and the numeric^ formulation. 

— After the FE matrix [L] has been set up and factorized, [U] is determined 
according to (13). This corresponds to applying backsubstitution p times to 
the columns of [U]. Afterwards, [W] can be computed from [i?], [U] and [U]. 
Note that [U] and [W] remain unchanged as long as the FE Jacobi matrix [L] 
can be retained, at least during each non-linear iteration step. 

— The FE problem is solved as if there was no reaction of the field^on the circuit 
part, i.e. formally [U]^ is omitted in (8). Equation (10) reveals [L] = [L] in this 
case, while {Fs} remained unaltered. The solution of (11) with [L] instead of 
[L] is corrected afterwards by using the Woodbury formula (12). 

We applied the algorithm to the model problem shown in Fig. 1. The coil con- 
tributed with about 30% to the total number of 4315 unknowns. A sequence of 
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70 time steps yielded altogether 1470 DD iterations. A naive implementation with 
the Schur complement [L] being actually constructed and factorized was compared 
to the proposed algorithm. The FE matrix [L] together with its Cholesky factor 
required 5,8 MB of memory, the Schur complement [L] 10,2 MB, together with 
its Cholesky factor. This extra memory could be saved by the application of the 
Woodbury formula. The CPU time for the solution dropped from 309 s to 230 s. 
This is mainly due to the fact that the repeated solution of the FE problem based 
on the Cholesky factor of [L] is faster, even if the Woodbury correction introduces 
some overhead. The computed solution is in very good agreement with measured 
values [7]. 

3.2 Iterative Solution by the Conjugate Gradient Method 

In many practical applications, the system (11) will be solved by iteration. As its 
matrix is s.p.d., the PCG method is the best choice. During the linear iterative 
solution process, we have to deal with matrix-by- vector products. Prom (10) it can 
be seen that such products consist of a contribution of [L], which is the same as if 
no circuit was present, and a correction by [U][R]~^[U]'^ . The Schur complement 
needs not be explicitly formed [3]. The price to be paid for this are two n x p 
matrix-by- vector multiplications and the solution of a p x p system. 

In [3] the question was raised how the s.p.d. Schur complement system (11) 
could be effectively preconditioned. The simplest solution would just neglect the 
circuit part and apply a good preconditioner for the FE matrix [L]. This can also 
be understood from the theoretical point of view. Suppose a good preconditioner 
for the FE matrix [L] is available, say [JT], which should be easily invertible and 
spectrally equivalent to [L]. Then the spectrum of [7f]“^[L] is neatly clustered, 
yielding a rapid convergeime of the PCG iteration. This feature will be inherited 
by the spectrum of [K]~^[L] with the exception of at most 2p outlying eigenvalues 
resulting from the circuit part. It is well known that the convergence of PCG is not 
much affected by a few eigenvalues off the bulk of the spectrum [5, Ch. 9]. More 
precisely, after 2p steps the PCG iteration will recover the same speed as if it had 
been applied to [L] with preconditioner [K]. 

Alternatively, one could be tempted to include the circuit part into the precon- 
ditioner and use 



[K] = [K] + [U][Rr[uf 



(14) 



instead of [K]. Practically, the preconditioner [K] would be applied as if no circuit 
was present and afterwards a cor^ctjon by means of the Woodbury formula t^k 
place. However, it is not clear if [i^][T] has better spectral properties than [AT][L]. 

To shed light on that point we conducted some numerical experiments. We con- 
sidered the model problem shown in Fig. 1 and the first step of the DD iteration. 
The iteration starts with a zero vector potential. This gives rise to a FE problem 
with homogeneous Dirichlet boundary condition. We used an incomplete Cholesky 
(IC) factorization as preconditioner and varied the drop tolerance to examine the in- 
fluence of the preconditioner’s quality. Three different problems have been studied: 
FE problem without circuit coupling (matrix [K]~^[L]), FE problem with circuit 
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coupling (matrix [K]~^[L]) and FE problem with circuit coupling and modified 
preconditioner (matrix [K]~^[L]) according to (14). The results are summarized in 
Table 1. 



Table 1. Performance of ICCG applied to the FE part of the model problem 
shown in Fig. 1 with homogeneous Dirichlet boundary condition. The termination 
criterion for the relative residual norm was set to 10“^®. The table gives the number 
of required iterations for various levels of the drop tolerance of the IC factorization. 



Drop tolerance 


< 10' 


-5 JO-4 


10“^ 




10"^ 


> 10° 


W/o circuit coupling 


1 


3 


4 


7 


17 


20 


With circuit coupling 


2 


3 


4 


7 


16 


20 


With circuit coupling, 
modified preconditioner 


1 


3 


4 


6 


17 


18 



It can be seen that the rank-1 circuit coupling has hardly any infiuence on the 
convergence, according to the above theoretical considerations. A drop tolerance 
< 10“^ yields a complete Cholesky factorization and the CG scheme degenerates 
to a direct solver in the case without circuit coupling. This behaviour is recovered 
in the case with circuit coupling, provided that the circuit part is included in the 
preconditioner. In all other cases it can be seen that the modification (14) has 
hardly any influence. In fact, [K] is a low rank perturbation of [K] so that this 
finding is in accordance with theory. We also repeated the experiment for other 
field-circuit coupled problems and found always similar behaviour. This suggests 
that the observations from Table 1 are of^uite general nature and it is not worth 
while using the modified preconditioner [K] instead of [K]. 



4 Conclusions 

In this paper, a direct field-circuit coupling technique has been addressed. The 
circuit part has been eliminated from the overall matrix by taking the Schur com- 
plement. The solver has to deal with s.p.d. matrices only. This has alre^y been 
pointed out in [10]. The key point is that the Schur complement matrix [L] differs 
from the FE matrix [L] by a low rank correction. In case of a direct solver the 
situation can be handled elegantly by using the Woodbury formula. As far as itera- 
tive solution by the PCG method is concerned, only the matrix-by-vector product 
needs an additional correction. The overhead of this correction is 0{n). The original 
FE preconditioner can be retained without deterioration of the convergence rate. 
Numerical evidence for these findings has been given by means of a simple model 
problem. 
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Abstract. In practice the shape of the electrodes and the spacers in high voltage 
equipments are so designed that the electric stress on the electrodes, mainly on the 
live electrode, and on the spacers are well within the limits. This necessitates that 
the practical contours of the insulators have complex geometries. The complexities 
are often increased by the constraints imposed by the mechanical considerations in 
the Gas Insulated Systems (GIS). Suitable techniques are required for accurate yet 
efficient simulation of such complex geometries. This paper highlights a modified 
algorithm for computation of electric field distribution by indirect boundary element 
method (indirect BEM) around a complex electrode- spacer configuration used in 
practise in high voltage system arrangements. 



1 Introduction 

The advent of modern digital computers has encouraged researchers in the area 
of electrostatic field calculation to concentrate on developing numerical techniques 
for the purpose. Efficient asymmetric field calculation in high voltage equipments 
by numerical techniques based on integral equation methods such as Charge Simu- 
lation Method (GSM) [1-2] or. Boundary Element Method (BEM) [3-4] have been 
carried out successfully over the last years. But these numerical methods involve 
accurate simulation of the electrodes and the spacer geometries. Hence more em- 
phasis should be given for simulating the complex geometries of the arrangements 
that are used in practise. 

For several years, BEM has been applied over planer triangular and rectangu- 
lar elements [5-6]. But the practical problem with these elements lies in the fact 
that approximation of curved surfaces with these planer elements often results in 
inaccuracy in surface simulation leading to distortion of field on the surfaces. 

Use of curvilinear triangles in parametric form with quadratic approximation of 
surfaces were reported in some literatures [7-8]. But for these types of elements, the 
computation time required is large as has been reported in [4]. The method applied 
in [4] requires a large number of elements leading to a greater time requirement for 
efficient handling of realistic field problems. 

In the recent past, approximation of the curved surfaces bi-cubic spline function 
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has been reported [9]. But meshing the boundary surface in [9], requires rigorous 
computational strategies. Hence a more precise and convenient approach is required 
to solve complicated electric field problems. 



2 Indirect Boundary Element Method 

The basis of indirect BEM lies in discritizing the electrodes and the insulator 
surfaces into small large number of surface elements, termed as boundary elements, 
either in form of curvilinear rectangles or triangles. The equivalent charge densities 
are considered to reside at the vertices of each of these boundary elements having a 
non-linear distribution inside the elements but linearly distributed along the edges 
of the elements. These charge densities are solved from the boundary conditions: 

1. Given potential value on the conductor surfaces with known potential are main- 
tained. 

2. Condition on the interface between two dielectrics are satisfied i.e., the normal 
components of electric flux density on both sides of the dielectric interface is 
continuous. 

and using the following equations 

N 

^ ^ Pij^j ~ 

N 

^ ^ ^ij^j "b {Fij — 1) a i 
where, 

N = Total number of nodes, Nc = Total number of nodes on the conductor surface 
with known potential, Nf = Total number of nodes on the conductors of free 
potential Pij = Potential at the point due to unit charge, Fij = Electric 
stress at the point due to unit charge, ai = Charge density at the node 
and aj = Charge density at the node. The second boundary condition imposed 
over the dielectric-dielectric interface yields 

eiEin{i) = S2E2n{i) i.e., Dln(i) = T>2n(i) (3) 



= Oi = iVe + l,....,A^c + iV,- 


(1) 




(2) 



where, Sk is the permittivity of the medium k, k = 1, 2, 3 .... 

Ejn{i) and Djn{i) denote the normal component of the electric intensity and normal 
component of electric flux density respectively at any point i in medium j, j = 
1,2,3... 

3 Applied Technique 

Practical electrodes and insulators are complex combinations of some basic ge- 
ometries which include sphere, cylinder, cone, toroid and disc. These geometries 
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may also have special features such as inclusion of holes on their surfaces or trun- 
cation at the edges. Hence, these elements are coded in computer and embedded in 
a software named ’’ASYMBEM” to cast the final electrode-spacer arrangement. 

The entire boundary surface of the arrangement is then subdivided into curvi- 
linear triangular boundary elements for generating mesh on the surface. These 
boundary surface elements are of first order and are transformed into local co- 
ordinates (^, r/, C) fo obtain the master triangle. The co-ordinate of any arbitrary 
point P inside such an element can now be represented as 



a — + a2T] -h asC 



( 4 ) 



where, a stands for x, y, z, a^, (i = 1, 2, 3), are the co-ordinates of the vertices of 
the triangle and =■ 1 — ^ — y. The equivalent surface charge density has been 
approximated by a linear function of the charge densities residing at the nodes of 
the surface elements, i.e. 



cr = -h 0-27/ -h asC 



( 5 ) 



Thus the potential at any point Q (a:, y, z) due to each boundary surface is calcu- 
lated as 



^ = (agi dQ2 CLQs) 



(Tl 

(T2 

(73 



( 6 ) 



where, (agi uq 2 ags) is the coefficient matrix for the potential function. 



dQl 



1 

47T£o 



II 



-^Jd^dn 

rpQ 



dQ2 



[ [ —Jd^drj 

dvreo J J rpQ 



CIQ3 



1 

4:7160 



1 1-^ 

II 



rpQ 



J drj 



rpQ = - (* 1 ^ + X2r] + xsOf 



( 7 ) 



The boundary condition at the dielectric-dielectric interface yields 



(Sl - £ 2 ) 
(£1 + £ 2 ) 



E„{Q) + 



2e:o 



= 0 



( 8 ) 




292 



A. Lahiri, S. Chakravorti 




Fig. 1. Electrode- Spacer Configuration 



The normal component of electric stress at any point Q on the dielectric-dielectric 
interface due to a charge residing at P is obtained from 



En(Q) = -V^cos{tpq,xiq) 



(9) 



Combining Equations (6), (8) and (9), the coefficient matrix for the entire distribu- 
tion of charge densities is obtained from which the fictitious charge density residing 
at each node of the entire arrangement can be evaluated by solving a set of equa- 
tions represented in a matrix form. With the now known values of the fictitious 
charge densities, the entire solution of the electrostatic field problem at any arbi- 
trary point on the boundary surfaces can be obtained. 

Evaluation of the coefficient matrix has been carried out by integrating equa- 
tion set (7) numerically using Gaussian formula over triangular domain [10] given 

by 



J J xdA = /3 ^ m, Ci) 



( 10 ) 



where, (3 = area of the triangle, Wi = the weight associated with each of the inte- 
grating point and N = number of integrating points. Since a master triangle is a 
right angled triangle having the length of each of the two sides including the right 
angle equal to 1 hence for such a triangle j3 = 0.5 

For solving potential, 7 points Gaussian integration has been carried out. In 
order to reduce the error due to the derivative of the potential function, 13 points 
Gaussian integration has been adapted for the calculation of electric field intensity. 
The major advantage of using such a numerical technique lies in the fact that the 
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Fig. 2. Distribution of ^ along the insulator surface 
Fig. 3. Error in Dn on insulator surface 



time and the requirement of the computer memory is much less for the evaluation 
of the entire field solution. 



4 Application Example 

The electrode-spacer arrangement shown in Fig. 1 has been taken as an appli- 
cation example in this paper. This arrangement is typically used in a 420 kV Gas 
Insulated System (GIS). The inner electrode is the live electrode and is maintained 
at a potential of 1 volt (normalized voltage has been considered for ready reference) 
and the outer cylinder is the ground electrode. Both the electrodes are made of ex- 
truded aluminium. The insulator is composed of epoxy cast resin whose relative 
permittivity is taken as 5.3 and the gas mixture used for purpose of insulation is a 
mixture of SFq/N 2 (80 % N 2 and 20 % SFq) whose relative permittivity is 1.005 
[ 11 ], 

5 Results and Discussions 

The potential plot along the insulator surface, as shown in Fig.2, shows that 
the potential along the insulator surface decreases uniformly from 1 V to 0 V as 
one moves from the live electrode towards the ground electrode which is expected 
because the live electrode is maintained at a potential of 1 V and the ground 
electrode at 0 V. 

For the dielectric-dielectric interface, the boundary condition for the normal 
component of the electric flux density has to be maintained as is seen from Equation 




Potential (Volts) 
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Fig. 4. Distribution of # along the live electrode surface 
Fig. 5. Percentage deviation in angle from 90^ 



3. Fig. 3 shows the percentage error on the spacer surface for normal flux densities 
calculated from both sides of the spacer surface i.e. 



% Error = x 100 

Din 



( 11 ) 



A total of 330 nodes on the insulator surface restricts this error within 0.5 %, which 
demonstrates the utility of the simulation tool developed. 

Another boundary condition that must also be satisfied is that the given values 
of the potential are maintained on the conductor surfaces. Fig. 4 shows that the 
potential at any point on the live electrode varies from 1 V to 0.9982 V. This is in 
good agreement with the boundary condition because the given voltage on the live 
electrode has been considered as 1 V, as mentioned earlier. 

Fig. 5 shows the percentage deviation of the angle which the resultant stress 
makes with the electrode surface. The percentage deviation has been calculated 
from 90® due to the fact that electric stress is always normal to the electrode 
surface. Fig. 5 reflects that this percentage deviation is well within 0.06 percent. 
This is in fairly good resonance with the expected result. 

Fig. 6 shows the distribution of resultant stress along the live electrode and the 
variation of the resultant stress along the spacer surface is presented in Fig. 7. 
Both of these stress distributions are in good agreement with the shape of the live 
electrode and the geometry of the spacer respectively. The electrode-spacer 

arrangement considered in this paper consist of 2400 elementary boundary surfaces 




Resultant Stress ((V/mm)/V) 
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Fig. 6. Stress Distribution along the live electrode surface 
Fig. 7. Stress Distribution along the insulator surface 



with 975 nodes on the electrode surface and 330 nodes on the insulator surface. 
These are the optimized values of the respective parameters for the arrangement 
under study obtained by varying them over a wide range and observing the effect on 
the different significant quantities related with the electrostatic field computation 
as described in this paper. 

For the configuration under study, the execution time required for obtaining 
the results, including the matrix solution of the set of system of equations, is 40 
minutes on a Pentium III PC with 550 MHz CPU clock speed and 128 MB RAM. 



6 Conclusions 



The results of a 3-D field analysis presented in this paper reflects the utility of 
the developed software ’’ASYMBEM” in computing the complex electrode-spacer 
configurations used in high voltage system arrangements. 

The main advantage of ’’ASYMBEM” lies in the fact that it is capable of ana- 
lyzing electric field on and around the surfaces of complex high voltage system ar- 
rangements in a faster time and with a lesser memory requirement of the computer. 
The reason behind this is that ’’ASYMBEM” considers first order approximation 
of the boundary elements instead of a second order approximation. This first order 
approximation to curvilinear triangles approximates triangles of smaller areas to a 
fairly good degree of accuracy. Hence, ’’ASYMBEM” is made capable of generating 
meshes over the surfaces to produce boundary elements of desired areas. 
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Abstract. This work deals with the calculation of touch voltages and leakage 
current density distribution in direct-current subway grounding systems. 

A new approach for direct-current subway grounding systems calculations was 
developed. In this approach, all system components are modelled as multi-input, 
multi-output blocks, which are interconnected using appropriate electrical equation. 

Results obtained using the proposed approach show good agreement with results 
reported in previous works. 

Keywords - Grounding Systems, Subway, EMC 



1 Introduction 



On subways, the feed system operates, on general, in direct current, using one of 
the two rails as a negative terminal. 

The positive terminal can be a catenary or a lateral rail, called third rail (3T). 
In this work we will always call the positive terminal third rail, but the method can 
also be applied on subways that possess a catenary. 

Three different metallic parts compose the subway grounding system as Fig. 1 
shows: 

TV - the railway grounding 

TT - the tunnel grounding 

TE - the external grounding 

It is interconnected system, and Fig. 1 shows that a dangerous voltage between 
the platform and the train could be applied to a passenger. In order to decrease the 
voltage between the train and the platform, TV and TT could be interconnected. 
However, this solution implies a leakage density current increase and, as conse- 
quence, the system will suffer from corrosion. Another solution to this problem is 
isolate TV but this implies a touch voltage increase. 

The solution for these two antagonistic solutions is the installation of the rail 
on a material with high electric resistance, which grounds TV to the structural 
hardware of the tunnel. TT is electrically connected, through concrete (high electric 
resistivity 600 i?.m) to the metallic rings that make the locking of the tunnel, called 
external grounding TE. 
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Fig. 1. The subway grounding system 

2 Methodology 

Figure 2 shows an elementary length of the railway system (length Ax). 




Fig. 2. An elementary length of the subway grounding system 



Kirchhoff’s Laws could be applied to this circuit in order to obtain: 
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This set of equations could be solved using an equivalent circuit as proposed 
by (Pereira, [1997]). This approach is based upon on the model of transmission lines 
proposed by (Stevenson, [1982]). 
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In this work, we use a more straightforward approach: the state variable method. 
The solution of the set of equations, when all initial currents and voltages are given, 
could be written as: 
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where L is the length of the railway. 

This equation shows us that each length of a subway system could be analyzed 
as a multi-input, multi-output block. 

The electrical substation could be analyzed as a voltage source as Fig. 3 shows. 
The train is, on general, modeled as a current source as Fig. 4 shows. 




Fig. 3. The electrical substation and the grounding system 



This allows us to write two sets of equations, the first for substation and the 
second to the train: 
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Fig. 4. The train of the grounding system 
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So any part on subway grounding system could be analyzed as a set of equations 
for a multi-input, multi-output block. 



[7.] = W-M + [q 



( 1 ) 



where [75] is the output state variable and [7^;] is the input state variable. 

In order to calculate the currents and the voltages, it is imposed that the first 
and the last length have always about 500m. Thus, the currents on left and right 
end of the whole system could be set to zero. 

To solve the problem all the blocks are reduced to a single block on a recursive 
process. This single block has 6 equations and 6 unknowns (the voltages at the 
boundaries), because the currents at the boundaries are known. Equation (1) and 
the association of blocks allow us the calculation of the currents and voltages on 
every point of the grounding system. 



3 Results 

The model was used to analyze Paulista subway line at Sao Paulo, Brazil. Figure 
5 show the substations and the position of the trains. In order to validate it, three 
cases where analyzed. The first case is a steady state problem. A short-circuit in 
the line is also analyzed. A bad contact between TT and TV was also analyzed. 
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The voltage between the third rail and the rail (3T - TV) and the current in 
the third rail on steady state were calculated as a function of a distance, as Fig. 6 
shows. In this condition, the voltage between TV and the external grounding TE as 
a function of a distance is shown on Fig. 7. The maximum measured voltage between 
TV and TE at this condition is lOV and the minimum measured voltage is —IV. 
Figure 7 shows a agreement between the adopted model and the measurements. It 
must be remarked that there are uncertainties on the model parameters and on the 
measurements, so only a qualitative comparison could be done. All the obtained 
values follows the VDE standard. A short-circuit on the middle of the line was 



\fottoge 3T - TV (V) 




Distance (km) 



Current 3T (A) 




Distance (km) 



Fig. 6. Voltage between third rail and TV and the current along the third rail 



also simulated. Figure 8 shows the voltage between (TV-TE). Figure 9 shows the 
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Voltage TV - TE (V) 




Fig. 7. Voltage between TV and TE 



voltage between the third rail and (3T-TV) and the current on the third rail at this 
condition. 



Voltage TV - TE (V) 




Fig. 8. Voltage TV-TE during a short-circuit 



A poor grounding condition, e.g., a bad contact between TT and TV was also 
analyzed. Figure 10 shows current density distributions along the subway line. The 
bad contact between TT and TV is located between Clinicas station and Consolagao 
station and shows that a corrosion process will begin, because the leakage current 
density has a very high local value. 
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Distance (Km) 



Fig. 9. Voltage between third rail and TV and the current along the third rail 
during a short circuit 
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Fig. 10. Leakage Current Density along the railway: (TT-TV) and (TT-TE) 



4 Conclusion 



A new analysis method for subway grounding systems was proposed. It is based 
on state variables and multi-input, multi-output blocks. The proposed method is 
a powerful approach to subway grounding systems design because it is able to 
analize steady-state conditions, short-circuit conditions and even a poor grounding 
condition. 
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Ab-Initio Calculation of Substrate Currents 
Using Ghost Field Gauging 
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Abstract. Recently a new approach was presented to determine the high-frequency 
electromagnetic behavior of on-chip passives and interconnects. The method solves 
the electric scalar and magnetic vector potentials in a prescribed gauge. The latter 
one is included by introducing an additional independent scalar field, whose field 
equation needs to be solved. This additional field is a mathematical aid that allows 
for the construction of a gauge-conditioned, regular matrix representation of the 
curl-curl operator acting on edge elements. This paper reports on the convergence 
properties of the new method and shows the first results of this new calculation 
scheme for VLSI-based structures at high frequencies. The high-frequent behavior 
of the substrate current, the skin effect and current crowding is evaluated. 



1 Introduction 



One of the important simulation challenges in VLSI design is the adequate charac- 
terization of high-frequency interconnects and on-chip passives. Important effects 
are substrate currents, current crowding at the edges of the interconnects due to 
skin effect and the proximity effect. The characteristic electrical length at the fre- 
quencies under consideration (GHz range) is rather large (cm scale). However, the 
mesh scale required for an accurate field calculation is determined by the very fine 
geometrical details (sub-micron scale) of the metal lines. 

Although the physics of these problems is understood for a long time, detailed 
and fast calculation schemes are still lacking. We recently introduced an approach 
to simulate high-frequency effects of on-chip interconnects, dedicated to the spe- 
cific geometry of the problem. The detailed description of the method is presented 
elsewhere [1-5]. 

The frequency domain is addressed. The meshing is Cartesian, suitable to show 
the validity of the model and to simulate on-chip interconnects, because in a first 
approximation, interconnects can be regarded as parallel to the axes of a Carte- 
sian frame. However, this is not an essential restriction and the technique can be 
extended to unstructured meshes. 

This paper is organized as follows: In the second section we discuss the need for a 
gauge condition. In the next section the essential properties of the solution method 
are given, emphasizing the novel aspects of the approach. Then the influence of 
the gauge fixing on the eigenvalue spectrum is discussed In the next section several 
VLSI-based benchmark structures are calculated. Finally we reach our conclusions. 
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2 Do We Need A Gauge? 



To describe the electrodynamical environment, different approaches can be pursued. 
The electric and magnetic field variables can be used as independent variables. 
Since these variables are gauge invariant, no gauge condition is required. This is 
the case in most finite-difference schemes. However, to comply with the needs of 
IC designers who work in the (quasi-) static regime, a formulation that uses the 
potentials V and A as independent variables simular to the FIT approach [6-8] is 
preferred. The electric potential V is associated to the nodes (vertices), while the 
magnetic potential (A) is put on the links (connections, edges) between the nodes 
of the mesh. The electrodynamic description results into a Poisson equation for 
the electric scalar potential and a curl-curl equation for magnetic vector potential 
respectively. These potentials are not uniquely defined which results in a singular 
matrix representation. In order to arrive at a unique solution for the potentials we 
need to introduce a gauge condition. The inclusion of a gauge condition, such as 
the Lorentz gauge or Coulomb gauge, is occasionally referred to as ’gauging’. The 
curl-curl equation can be regularized by eliminating the unknown vector potentials 
assigned to the edges of a spanning tree. However, this kind of gauging leads to a 
slow convergence of the Krylov-subspace iterative solvers [9]. 

But do we really need to carry out the extra work of fixing the gauge? Let us start 
with a matrix representation of a singular linear system: 



Mx = b det{M)=^0 



( 1 ) 



It has been shown that if b is in the range of M, then the standard conju- 
gate gradient like methods are successful [9-1 Ij. However, if b contains a com- 
ponent outside the range of M, then the problem is ill-posed and no conver- 
gence is reached. The iterative solution methods without gauging is effective if 
the right-hand side of the curl-curl equation can be constructed in such a way 
that its divergence vanishes, i.e. if there is no component outside the range of 
the curl-curl operator. This can be understood by realizing that the Krylov space 
An=span{x, Mx, M^x, M^x, . . . , and that the search for the solution 

fully takes place in the range of M. 

However such a construction is not always possible. Whereas in magnetostatic cal- 
culations with metallic conductors one can easily realize that at the start of solving 
the equation V x V x A = //qJ, the condition V • J = 0 is satisfied, this is much 
less easy if a non-linear dependence of the current J on the vector potential ex- 
ists. This is the case for non-linear media such as semiconductors as well as for 
time-dependent fields. Furthermore, numerical errors are inevitable and especially 
for very large systems or systems that need many iterations, a small component 
of b outside the range of M may be amplified and leading to lack of convergence. 
Therefore, a gauging may be preferred. Also for particular problems in time domain 
[12] or multigrid we must look for an adequate gauge construction. This might be 
the tree-cotree gauging [13], the grad-div gauging [7], or the ghost-field gauging [1]. 
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3 Ghost-Field Solver 

The method of [1-5] introduces an additional scalar field that needs to be obtained 
as part of the solution method. The solution for this additional field does not carry 
energy. Therefore, we have named it a ’ghost field’, being a mathematical aid that 
allows for the construction of a gauge- fixed, regular matrix representation of the 
curl-curl operator acting on edge elements. With the use of the ghost-field gauging 
technique, the Maxwell problem also results into a Poisson problem for the scalar 
potential 



- V . (eVV) = p, 



( 2 ) 



and a curl-curl equation for the magnetic vector potential that is solved together 
with a gauge equation for the ghost field y: 

V X V X A - 7 Vx = /ioJ - (^'^ + ^ + 

V-A + V^X = 0. (4) 



An extra parameter 7 (with dimension m~^) is introduced in order to account for 
the dimensions of the system. So instead of the curl-curl operator combined with 
the gauge condition 




that lead to matrices M that are sparse, well-posed, yet not square, the operator 



/ V X Vx -7 V 
V V- 



( 6 ) 



is considered. This operator leads to matrices M that are also sparse, regular and 
square. Moreover, the resulting matrices are semi-definite and therefore well suited 
for iterative solvers. 



4 Eigenvalue spectrum 

The major drawback of the existing gauging methods is their influence on the con- 
vergence rate of iterative Krylov-based methods. Tree-cotree gauging for instance 
does not affect the maximum eigenvalue, but the lowest nonzero eigenvalue is re- 
duced resulting in a reduction of the condition number of the matrix and therefore 
(much) slower convergence [9]. 
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Predicting the convergence behavior of linear iterative systems is a very difficult 
task to carry out. Only for CGS there are exact proofs that the condition number is 
a good indication for convergence. When the condition number is high (low), CGS 
will be converging slowly (fast). 
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Fig. 1. Spectrum of the singular V x V-operator and the regularized version for 
different values of 7 before pre-conditioning. 



In order to evaluate the ghost-field gauging we look at the eigenvalue spectrum of 
the matrix generated by discretizing the system (3)- (4), and the influence of the 
tuning factor 7. A regular mesh with 7 nodes in each direction is examined. This ac- 
counts for 450 degrees of freedom for the singular curl-curl matrix, and 575 degrees 
of freedom for (3)- (4). The results are shown in Figure 1 before preconditioning and 
in Figure 2 after preconditioning. 

- Before preconditioning: Figure 1 shows that eigenvalues of the curl-curl op- 
erator and the system (3)- (4) before preconditioning. The curl-curl matrix is 
symmetric and therefore the corresponding eigenvalues are all real. Further- 
more because of the singular character of the matrix we find multiple (125) 
zero eigenvalues. 

When 7 is increased, the zero eigenvalues transform in small eigenvalues, while 
the nonzero eigenvalues remain unchanged. The effect on the condition number 
is dramatic however. While for 7 == 0 the condition number is not affected by 
the zero eigenvalues (only the lowest nonzero eigenvalue is important), this is 
no longer the case for finite values of 7. In the latter case, the lowest nonzero 
eigenvalue is very small, and therefore the condition number large. For very 
large values of 7, we find 250 eigenvalues at the upper end of the spectrum. 
The rest of the eigenvalues are still the same 325 of the curl-curl operator. So 
the smallest eigenvalue for large 7 is the same is the smallest nonzero eigenvalue 
for 7 0. 

- After preconditioning: Figure 2 shows the eigenvalues after ICLU precondition- 
ing for the curl-curl operator and the regularized one. While the eigenvalues 
for the singular operator are close to zero, the regularized version accounts for 
extra imaginary parts in the eigenvalue spectrum. 
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Fig. 2. Spectrum of the singular V x V-operator and the regularized version for 
different values of 7 after preconditioning (circles for 7=1000, plusses for 7=0.01) 



5 Simulation results 

5.1 Skin effect 




Fig. 3. Cartesian approximation of a circle, and the current density at lOOGHz. 



The quantitative description of skin effect in a cylindrical wire can be found in 
many text books. Although the discovery of the effect took place more than 125 
year ago, nowadays still papers appear on resistance and inductance calculations 
[14,15]. The internal impedance of a cylindrical wire with radius a and skin depth 
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S is given by [16], 

1 + i ^0 [(1 + J ) q /^] /yx 

27TaSah[{l-\-j)a/dy 

with the use of the Bessel functions In^ This is an excellent benchmark problem for 
high-frequency solvers. We start with a brick representation for the circular form 
(Figure 3). The current density at 100 GHz is also shown, and for an aluminum 
cylinder, the skin depth becomes 0.28 iim as can be verified in Figure 3. 
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Fig. 4. The internal impedance of a cylindrical wire and the current density in the 
wire (analytical and simulated). 



The impedances calculated with this solver compared with the analytical solu- 
tion, can be seen in Figure 4. For the line resistance (upper curves), the relative 
error is less than 0.08, while for the reactance (lower curves), the results are much 
better, and the two curves match closely. 

5.2 Ring Structure 

To illustrate the method, we show the results of an aluminum ring embedded in 
silicon oxide, on top of a moderately conducting substrate (Fig 4) at 500 MHz. The 
ring dimensions are 90 x 50 /xm, the substrate is modeled as a low conductive metal 
((=0.01 (pm) and we used a 20 x 20 x 20 mesh. An AC (electric) voltage is put on 
one of the output ports with an amplitude of 0.01 V, forcing a current in the ring. 

— The skin effect of the currents in the lines that connect the output ports and 
the ring is shown in Fig 6a. The currents is crowding at the inner edges of the 
conductor. 

— Fig 6b show the current density in the substrate, and the circular Eddy currents 
in the substrate. In a vector plot of the current density we observe that the 
current is flowing in the opposite direction of the current in the aluminum ring. 
A region of higher substrate current occurs under the input ports due to a 
higher magnetic field and hence a higher induced current. 
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Fig. 5. Aluminum ring above a low conductive substrate 




(a) (b) (c) 



Fig. 6. Top view of the current densities in the ring (a), of the substrate (b) and 
side view of current densities of the ports (c) at 500 MHz. 



— The side view on the current density of the ports (Fig 6c) shows the proximity 
effect. Because the currents are flowing in the opposite directions for the two 
ports, these currents will attract each other and the highest current density 
can be found in the inner corners. Fig. 6 also shows that the current density is 
the highest in the inner part of the ring, due to the proximity effect. 



6 Conclusions 



In this work we showed that a new approach of ghost field gauging can make the 
curl-curl operator square and regular. The complex values of the eigenvalues depend 
on the choice of the additional parameter 7, and depending on this parameter, 
different convergence patterns can be expected. 

High-frequency effects are recovered with the use of the newly developed approach 
dedicated to spiral inductors and passive structures [1-5]. 
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Abstract. We deal with a Quantum-Drift-Diffusion (QDD) model for the descrip- 
tion of transport in semiconductors which generalizes the standard Drift-Diffusion 
model (DD) through extra terms that take into account some quantum dispersive 
corrections. We also study numerically the influence on the I-V curve of the electron 
effective mass, the barrier height and width, and of the ambient temperature. The 
performance of several linearization algorithms, i.e. a two Gummel-type iterations 
and the fully-coupled Newton method are also compared. 



1 The QDD Model 

The unipolar QDD model for a semiconductor occupying the open bounded region 
i? C IR"^, d — 1,2,3, comprises the following set of equations in the space-time 
cylinder Q x (0, t{]: 



' ^ + V • (pnVF) = 0 , 

F = V+Vbar-Khln-+-^ , 

^ ' — V — ' ni omq yjn 

(I) ' V ' 

(II) 

VV=|(n-ATd) , 



( 1 ) 



where n is the electron concentration, V the electric potential, F the quasi- Fermi 
potential, Nd the doping profile, Vbar the barrier potential, Vth the thermal voltage, 
n\ the electron intrinsic concentration, fi the mobility, while the quantities /i, g, e, m 
are the reduced Planck constant, the (positive) electron charge, the semiconduc- 
tor permettivity and the electron effective mass, respectively. The above system, 
supplemented with suitable initial and boundary conditions, is to be solved for the 
unknowns n, V, F. The electron current density is given by the constitutive law 
J_ — —qfinVF, so that (l)i represents the classical continuity equation but with a 
nonclassical constitutive law for the current density. Moreover, like in the DD case, 
the temperature is supposed to be constant, having neglected all the energetic ex- 
change phenomena. In(l)i-(l) 3 we have singled out the two terms that characterize 
the QDD model with respect to the DD model, i.e. 
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(I) is the potential due to the presence of the heterostructure corresponding to 

barriers and wells between the different materials, while 
(II) is the dispersive term, modeling a typical quantum effect [1]. 

The bipolar QDD model has been studied in [7] where the conditions under 
which existence and uniqueness of the solution at thermodynamic equilibrium hold 
are discussed, basically, via a variational argument consisting in minimizing the 
total energy of the system. In nonequilibrium conditions, the uniqueness of the 
solution is proved to hold only for moderate external voltages ([7]). This fact has a 
real counterpart in RTD’s which exhibit a hysteresis cycle for particular values of 
the external voltage, as proved experimentally, as well as numerically ([3]). 



2 Iterative Maps and Numerical Algorithms 

In this section we address both the linearization techniques and the numerical issues 
necessary to obtain an approximate solution to the QDD equations. 



2.1 Iterative Maps 

To solve the (nonlinear) system (I)i-(I )3 it is necessary to employ some lineariza- 
tion techniques. In particular, we have compared three different functional iterative 
algorithms: 

1. The fully-coupled Newton method; 

2. A two-step fixed-point map for F which can be summed up as: Given 

first 

— solve (I) 2 -(I )3 for with coupled Newton’s method, then 

— solve the (linear) equation (I)i for 

3. A three-step generalized Gummel’s map such that: Given 

— solve the (nonlinear) equation ( 1)2 for then 

— solve the nonlinear version of (1)3 for 

^2y{k+i) ^ £ (^„(fc+i)exp . finally 

— solve the (linear) equation (I)i for 

The three algorithms above are ordered in decreasing degree of coupling: algorithm 
I. is a very well-known general purpose method, algorithm 2. is theoretically and 
numerically studied in [7], whereas algorithm 3. is our novel contribution, first 
devised in [8]. This last algorithm generalizes the clcissical Gummel map to the 
QDD model, whereby three successive subproblems are to be solved at each step. 
In particular, it consists of two nonlinear steps for and respectively, 

and a linear step for We have applied the three algorithms to the solution 

of a RTD and some numerical results are shown in Sect. 3. Hereafter, we shall study 
the stationary QDD model, i.e. we shall assume ^ for all the variables. Let us dwell 
on algorithm 3. which can be considered as a generalized Gummel map, see e.g. [4]. 




Numerical Simulation of a RTD with a QDD model 

Prom (1)2 and letting a = we have n = n\ exp 
substituting in (1)3, we obtain 






315 

and 






n\ exp 



+ Vba 



exp - 



V -F 
Kh 



-Nd 



( 2 ) 



which is the nonlinear version of the Poisson equation. Given ^ > 0, the 

first step of the algorithm consists in solving the following boundary value problem 
for n = 



- Vth In — + Vbar + 
yn n\ 



n = Ueq 

Vn ’ i/ = 0 



0 in i? , 

on Fu , 
on Pn , 



( 3 ) 



where Ueq is the equilibrium value of the concentration, while /b, -Tn are two subsets 
of the boundary df^ such that df2 = PdUPn, with /b ^ 0, and u is the unit outward 
normal vector to df^. Then from (3) we formally obtain Vth In ^ ^ _ y(^) q. — 

a / ^ + Vbar , which used in (2) allows us to write the second step 

\/^(fc+i) 

of the algorithm, to solve for V = as 

y Veq + bext on Td , ^ ^ 

^ W ‘ u = 0 on /n , 

where beq, bext are the equilibrium value of the potential and the external applied 
voltage, respectively. We point out that this equation can be regarded as the QDD 
counterpart of the DD nonlinear Poisson equation for b, i.e. 

where m is replaced by exp . Finally, the last step of the algo- 

rithm requires solving the linear problem for F = 

! V • = 0 in i? , 

F == Feq + bext On Fq , (5) 

VF ' i/ = 0 on Fn , 



where Feq is the equilibrium value of the quasi-Fermi level. The sequence of prob- 
lems (3)- (5) defines our version of the Gummel map applied to the iterative so- 
lution of the QDD model. We remark that the nonlinear steps (3)- (4) are both 
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linearized with Newton’s method. Moreover, computationally, it is advantageous to 
solve problem (3) for w == y/n instead of n, which yields the following problem for 
w = 



' aV^w - 2Vthwln — + «;(Fbar + = 0 in 12 , 

W\ . 

^ W = Weq on /d , 

Vtc u = 0 on /n , 



where wi = y/n{ and rceq = 



2.2 Numerical Algorithms 

As far as the discretization is concerned, all of the three subproblems (6), (4)- 
(5), after linearization, are solved using piecewise linear finite elements. Special 
care is necessary for (5) which requires using piecewise harmonic averages for the 
terms Important issues to deal with are the scaling of the unknowns and 

the computational cost required to solve all of the algebraic systems arising after 
the discretization. With reference to a one- dimensional finite element mesh whose 
characteristic size is h, the following scaling has been used throughout: 

n = Nh\ V = lO-^^y|, F = V, 

where N is the value of the doping at the contacts. We have carried out an extensive 
numerical validation which proves that this scaling guarantees better conditioning 
of the algebraic linear systems and more equilibrated coefficient matrices. Finally, 
to compare the three algorithms in terms of computational cost, we have imple- 
mented all the numerical codes in Matlab and we have simulated a one-dimensional 
RTD under different conditions. The results, extensively discussed in [8], show that 
algorithm 1. is the cheapest one when the sparse structure of the matrices is taken 
into account, otherwise algorithm 3. performs better. Of course, the algorithms 
might perform other ways when going to multi-dimensional problems. 



3 Numerical Results 

We show in this section several numerical results referring to a one-dimensional 
RTD, i.e. a heterostructure based on two AlGaAs barriers and a quantum GaAs 
well. The model correctly reproduces the Negative Differential Resistance (NDR) 
of the I-V characteristic of the device. We aim at studying physical phenomena like 
the dependence of the I-V characteristics on the electron effective mass, ambient 
temperature, barrier height and width, as well as to test the numerical algorithms, 
as already discussed in the previous section. We have considered a RTD whose 
geometry is shown in Fig. l(left): The device length is 75 nm, the doping profile is 
5 X 10^^ m“^ in the channel (well and barriers) and 10^^ m“^ elsewhere, while all 
the other parameters vary according to the following simulations. The mobility is 
assumed to be temperature dependent (cfr. [6]). 
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Dependence on the Effective Mass 

The following figures show some numerical results for a RTD at 77 K. The bar- 
rier profile is shown in Fig. 1 (right): the height and width are 0.3 V and 5 nm, 
respectively, while the quantum well is 5 nm wide. Throughout, Jmax, Jmin denote 
the current density at the peak and at the valley, respectively, and the Peak to 
Valley Ratio (PVR) is defined as PVR= Jmax/Tmin. The next four figures in Fig. 2 




Fig. 1. Geometry (left) and barrier profile (right) 



display the I-V characteristic for the following values of the effective mass m (from 
left to right and top-down): [0.067, 0.126, 0.1675, 0.201]mo, where mo is the free- 
electron mass. Table 1 (left) summarizes the main results. Notice that the higher 
the effective mass, the stronger the NDR phenomenon, while the PVR increases 
progressively. Notice that for the lowest value of the effective mass the NDR does 
not occur. The last pair of figures in Fig. 3 show the electron concentration at 
the peak and at the valley of the current for the fixed value of the effective mass 
m = 0.126mo. In particular, notice the large values of the concentration inside the 
quantum well, especially at the valley which confirm the resonant phenomenon, in 
accordance with the basic theory of the device ([5]). 

Dependence on the Temperature 

The second series of simulations is carried out with barrier height and width of 0.35 
V and 5 nm, respectively, while the quantum well width is 5 nm and the effective 
mass is m = 0.126mo- Several values of the ambient temperature are considered, 
i.e. [77, 100, 150, 200, 300] K. Table 1 (right) collects the results (in I.S. units). Ob- 
serve that the NDR is a typical low temperature phenomenon, weakening as the 
temperature rises. This behavior can be explained theoretically by considering the 
different dependence of the current at the valley and at the peak on the tempera- 
ture. In the first case, the resonant level is inside the energy gap, thus the higher 
the temperature, the larger the current, because more electrons occupy the energy 
levels near the conduction band. In the second case, the resonant level coincides 
with the conduction band, thus the current is less sensitive to the temperature 
because all the levels near the conduction band have large tunneling probabilities. 
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almost independently of their occupation. As a consequence, the PVR decreases as 
the temperature rises, until the NDR disappears eventually. 



Table 1. I-V characteristics as a function of m (left) and T (right) 



TTl X 777.0 JmSbX ^ 10 


Jmin X 10 


PVR 


T p 


Tmax X 10 


Jmin X 10® 
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6.22 


6.22 


// 


77 2.5 


2.41 


1.23 


1.96 


.126 


5.19 


3.84 


1.35 


100 2.61 


3.97 


2.6 


1.53 


.1675 


1.52 


.827 


1.84 


150 2.41 


9.07 


8.45 


1.07 


.201 


.654 


.284 


2.30 


200 2.32 


20.3 


20.3 


// 



Dependence on the Barrier Height 

For this block of simulations we have taken a barrier profile with barrier width 
of 5 nm and a quantum well width of 5 nm, while the effective mass is m = 
0.126mo and T = 77 K. Several values of the barrier height are considered, i.e. 
[0.1,0.2,0.25,0.32,0.4] V and the results are summarized in Table 2 (left) (in I.S. 
units). In this case, the higher the barrier height, the lower the tunneling probability, 
thus we expect a decrease of the current as the height increases. The PVR should 
increase with the height because of the contribution to the total current of those 
electrons drifting thermoionically, which become more important at the valley ([2]). 



Dependence on the Barrier Width 

For the last series of simulations we have considered a barrier height of 0.325 V, 
a quantum well width of 5 nm, a fixed device length of 75 nm, an effective mass 
m = 0.126mo, and T = 77 K. The following values of the barrier width have been 
studied: [5, 8, 9, 10] nm, and Table 2 (right) collects the simulation results (in I.S. 
units). As the barrier width gets larger the current decreases due to the reduced 
width of the peak value of the transmission coefficient. Moreover, the PVR, starting 
with a barrier width equal to the width of the quantum well, should first increase 
and then decrease ([9]). 

4 Conclusions 

We have studied and applied the QDD model to the simulation of a RTD. It turns 
out that the QDD model is able to reproduce typical quantum effects of the device, 
such as the NDR and correctly reproduces the physical quantities such as electron 
concentration, electric potential and quasi-Fermi level, even over a wide range of 
variations of the parameters. We have also carried out a parametric study of the I-V 
characteristic of the RTD as a function of the electron effective mass, the width and 
height of the barriers, and of the temperature. Moreover, we have compared three 
numerical algorithms and we have proposed a suitable scaling of the equations in 
order to have numerically stable problems. 
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Table 2. I-V characteristics at different barrier heights (left) and widths (right) 
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Abstract. We present applications of homotopy methods, which make it possible 
to compute multiple dc-operating points of transistor circuits with standard net- 
work analysis programs. It is possible to capture all dc-operating points at least of 
smaller transistor networks with the help of one- and two-parametric homotopies. 
Uniqueness criteria of network theory can help to find a parameterization of the 
homotopy path. As an example for appropriate uniqueness criteria a well known 
theorem of Nielsen and Willson is applied. Bounds for the parameter space can be 
found by the no-gain property of transistor circuits. 



1 Introduction 

Most of the standard network analysis programs are provided with some Newton 
Raphson like algorithm for dc-analysis computing just one dc-operating point (e. g. 
Newton Raphson algorithm with source stepping). But often it is important to 
know whether a circuit can exhibit other dc-operating points than the computed 
one. The reader is reminded only of the unwanted latch-up effect of the operational 
amplifier //A709 (see [2]). 

In [8] new classes of homotopy methods for the computation of multiple dc- 
operating points have been introduced which can be realized by means of stan- 
dard network analysis programs. Here we give a short review of the driving-point- 
characteristic method (dpc-method), discuss its feasibility, and show some applica- 
tions in greater detail. 



2 Some network theoretic preliminaries 

In this paper a transistor circuit is modeled as a resistive network M. The topol- 
ogy of M is described by an oriented graph Q with some branch set Z (e. g. one 
branch per resistor, two branches per transistor). Normalized voltage- and current 
assignments to the branches of Q are the elements (i;, i) of the set^ S x E^. 

The i;-i-relation V C models the behavior of the devices contained in the circuit. 
It is most often described by equations. E. g., to model bipolar transistors we use 
the Ebers-Moll equations 

ic = Ics ( exp (vc/Vt) - 1 ) - af/Es ( exp (ve/Vr) - 1 ) , 

* e-mail naehringOiee.et.tu-dresden.de 
** e-mail reibigerOiee.et.tu-dresden.de 
^ An element x G E^ assigns to each branch h ^ Z a, value Xb G E. 
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ie = Ies ( exp (ve/Vr) - 1 ) - aJcs ( exp (vc/Vr) - 1 ) 

where e is the branch from base to emitter and c is the branch from base to collector, 
and where the constants Vt, / cs, -^ es, Q;f, Q;r have the usual meaning. 

For a network Af with graph Q and i;-z-relation V we write condensed Af = 
(5,V). The electrical junctions between the parts of the circuit are modeled by 
the graph G and the corresponding mesh and cutset equations. The set of voltage- 
current-assignments (u, i ) G S obeying Kirchhoff ’s laws is named Kirchhoff ’s set 
and denoted by V.. Finally, we have C := V DTi, the solution set of Af which stands 
for the set of the dc-operating points of the entire circuit. With these notations the 
following two statements are direct consequences of set theory. 

Intersection theorem: Let Af = (G^V), = (^,V), Af = {G^V) be networks 

with the same graph If V = V fl V then £ = V fl £. 

Covering theorem: Let Af = (5,V) be a network and {Af^)x^x ^ family of 
networks Af^ = {G^V^) with the same graph as Af . li V = then 

£ == ■ 



Throughout this paper we assume that networks are connected and do not have 
cutsets of independent current sources or loops of independent voltage sources. The 
transistor models have the no-gain property (see [6]). So the absolut value of any 
branch voltage (current) of a solution of a network Af does not exceed the sum of 
the absolut values of the voltages (currents) of the independent sources in Af. 

Let A/* be a network consisting of resistors, independent voltage and current 
sources and transistors. We say that Af is reducible to the feedback structure if 



(the feedback structure) 



is a possible outcome of the following three- 



step algorithm: 



1 . 

2 . 

3. 



choose two transistors (npn and/or pnp), replace them by the generalized tran- 
sistor symbol — C^, replace all others by the resistor network — 

choose some resistors and remove them; remove all current-sources, 
contract^ all voltage-source branches and all remaining resistor branches. 



If Af is not reducible to the feedback structure then Af has at most one dc-operating 
point. This uniqueness statement is a consequence of the well known fundamental 
theorem of Nielsen and Willson for bipolar transistor networks (cf. [4]). We will 
utilise it in section 3. 

In certain examples, where additional controlled sources were involved, the 
method of Hasler an Neirynck (see [7]) has been successfully employed as a 
replacement for the theorem of Nielsen and Willson. 



3 The dpc-method for the global dc-analysis of 
transistor circuits 

In this section we introductorily describe an application of the dpc-method to the 
simple flip-flop network in figure 1. 

^ ‘Contracting a branch’ means identifying its incident nodes and removing it. 
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For the analysis of this network an open circuit a is inserted which does not 
influence the solutions of the network. The so modified network M = (5, V) is 
depicted in figure 2. As we will see below it is possible to determine all dc-operating 
points of A/* with the help of this additional branch. 




In this example the dpc-method is applied to the interconnection of the flip-flop 
network with the open circuit a at the terminals A and B. 

The t;-i-relation V of AT is represented as the intersection V = fl of the v- 
z-relations of two modified networks and The network Af°° is constructed 
from Af by replacing the open circuit by a norator^ (cf. fig. 3), i.e. the equation 
zi: 0 is removed from the system of behavioral equations for this network. In 
the system of defining equations for the solution set of A/"°°, the number of 
variables exceeds the number of equations by one. Thus, we expect that C°^ C S is 
an one-dimensional manifold. We will use a partial curve C as homotopy 
path. The projection of onto the (fa, ia)-plane is shown in figure 5. 

The other network Af^ results from Af by exchanging all branches except the 
open circuit for norators. Thus, the only behavioral equation left for this network 
is the equation ia = 0 and V® is a hypersurface of the space S. 

With V = n V® the intersection theorem implies C = fl V^, i.e., the 
intersection points of with the hypersurface are the dc-operating points of 
Af. To find some of them we search the homotopy path . 

The homotopy path can be computed via a dc-sweep of spice: For any x G R 
let Af^ — (G, V^) be the network that results from replacing the norator in Af^ 
by a voltage source with source voltage x (see fig. 4). For x G R the f-i-relation 
of Af^ differs only from by the additional condition Va = x oi the inserted 
voltage source and can therefore be written as {('^?0 ^ V^|fa = x}. Thus, 

and by the covering theorem we have If for each 

^ For norators all pairs of voltages and currents are admissible (see e. g. [7]). 
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X G M the solution of the network is unique (i.e., the solution set has just one 

element), we obtain a map x {u, i) € which parameterizes . Actually, 

this is the motivation for the insertion of a voltage source at the nodes A and B 
(review fig. 1 to 4). The inserted voltage source breaks all feedback structures of A/*, 
i.e. the network is not reducible to the feedback structure and the theorem of 
Nielsen and Willson ensures that the network has a unique solution. 

In general, it is only possible to compute a numerical approximation of a partial 
curve of which is defined on a bounded parameter interval 

X C R. Still we have C fl = £ and therefore the intersection points 

of the homotopy path with the hypersurface V® are dc-operating points of J\f, 
Since we want to find all dc-operating points of J\f we have to estimate bounds 
for the parameter interval X such that contains all solutions of Af. Often, the 
no-gain property of transistor circuits can be applied for this task. Because of this 
property the branch voltages Va of all solutions {v^i) of the network Af cannot 
exceed the boundaries of the interval [0, 0.8] which is determined by the voltage of 
the power supply Thus, with X := [0,0.8] the solutions of Af are all contained in 

Often, more than one independent source is needed, to break all feedback struc- 
tures of a transistor network. Let Af = (G, V) be such a network and let Zy and 
Zi he minimal sets of branches in Af that break all feedback structures of Af if they 
are replaced by independent voltage and current sources, resp. Analogous to the 
introductory example we construct a network Af^ by replacing all the branches 
in Z := Zy \J Zj for norators and a network Af^ by replacing all branches in 
the complement Z\Z for norators. Again, we have V = fl V® and therefore 
£ = £°^ n V®. As in the introductory example we parameterize £°° and search it 
for the intersection points with V®. The above source replacements deliver new net- 
works = (G, V^) with := {{v, i) e v^\\/b e^y :Vb^ Xb^b G : is = Xb} 
for each prescribed source value assignment x G By Nielsen and Willson the 
networks Af^ have unique solutions and the map x G ^ {u, i) G £^ parame- 
terizes £^ . In most cases it is possible to find a reasonable bounded parameter set 
X C R^ such that £^ Ux covers all the solutions of Af. In the case of two 

replaced branches the set can be computed by means of a parameterized dc- 
sweep of pspice and the intersection points with V® can be found via the graphic 
program gnuplot. An example follows in section 4. 

Since the computational effort grows exponentially with the number of replaced 
branches it is quite desirable to stick to the case of one replaced branch even if 
uniqueness cannot be ensured for that case. But then one has to be aware that £^ 
can consist of several components which may even have turning points. Problems 
with turning points can be avoided if one parameterizes the curves by arc length 
via a path-following algorithm. Because of lack of space we can only give a simpli- 
fied sketch here (for a more detailed presentation see [3,5,9]). The path following 
algorithm is realized with a transient analysis of spice applied to an auxiliary dy- 
namical network M = (^,V). We use here C^-time functions u,i G C^(T, R^) of 
voltage- and current assignments defined on some time interval T — [0,tend] and 
introduce S := C^(T,R^) x C^(T,R^), Kirchhoff’s set B := {{u,i) G S\yt G T : 

^ In the case that the power supply voltage is considerably higher than the thresh- 
old voltage of a basis-emitter diode one can take the Ebers-Moll equations into 
consideration to estimate narrow bounds for the homotopy parameter. 
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i{t)) E 7^ }, the behavioral relation V := { {u, i) 6 | Vt G T : 

G II Di u{t)\\^ -h \\Dti{t)\f = 1} with the time derivative Dt and an 
appropriate norm || • || on and finally the solution set C = V Cil-i. Obviously, 
we obtain the identity 



C == {(u, i) G <S|Vt G T : G , ||Dt u(^)||^ -f \\Dti{t)\f = 1}. 

The additional behavioral equation ||Dtu(t)[|^ + ||Dii(t)||^ = 1 forces, that the 
curves t £T ^ in are traced with constant speed and therefore are 

parameterized by some equivalent of arc length. This equation can be described by 
a spice netlist with entries for capacitors and controlled sources. The set := 
{{u{t)^i{t)) G «S|(zx, i) G G T} which consists of traces of the curves t G T i-> 
(u{t), i{t)) is a subset of and can be used as a replacement for . 



4 Examples 

• Analogous to the introductory example, the dpc-method is applied to the network 
in figure 6 and the associated homotopy path is computed by a dc-sweep of spice. 
Therefore, a family of networks is constructed by exchanging the voltage source 
for a current source with source current x G X (replacing the behavioral equation 
Va = ^ with ia = x). The networks have unique solutions since they are not 
reducible to the feedback structure. The bounds of the parameter interval X = 
[0,0.011] can easily be estimated by the cutset equation for the branches jRi, R 2 , 
i? 3 , a and by the fact that no branch voltage exceeds the value 5. Prom the spice- 
computed dc-sweep in figure 7 one sees that the behavioral equation Ua = 5 of A/* 
is satisfied for five parameter values x (which equal ia). Therefore J\f exhibits five 
dc-operating points. This example suggests that sometimes it may be useful to use 
our method instead of the usual voltage source stepping for global de-analysis. 

• Next we sketch the transfer-characteristic method from [8]. Consider again the 




network Af in figure 6 but now with the transistor Afg substituted by the terminal 
equivalent subnetwork A/k (such a kind of substitution is due to Kronenberg [1]). 
For a global dc-analysis a family {Af^)x^x networks is constructed. In each Af^ 
the controlled source is replaced by an independent voltage source with source 
voltage X (see the corresponding subnetwork A/k in figure 6). Uniqueness of the 
solution of Af^ can be proven by the methods given in [7]. The parameter interval 
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X = [0, 0.84] is found with the help of the estimation for the current of the power 
supply (see above) and the Ebers-Moll equations. Prom the result of a spice dc- 
sweep (see figure 8 for the relevant section of it) one reads off the parameter values 
X (equal to Vc) which fulfill the behavioral equation Vb = Vc of the controlled source. 
These parameter values correspond to the solutions of J\f (cf. [8] for details), 
o At least two voltage sources have to be inserted into the network J\f of figure 9 
to break all feedback structures. With the aid of a parametric dc- sweep of pspice 
it is possible to compute the solutions of the resulting networks Afa^b (see figure 
10) in dependency of the source voltages Xa^Xb € [0,5]. The bounds of the two- 




dimensional parameter set X := [0, 5]^“’^^ are induced by the source voltage of the 
power supply of Af. The set C^^b ef all solutions of the networks Afa^b ^ 
is a two-dimensional manifold which can be parametrized by the source voltages 
Xa and Xb. The projection of the relevant part C^b ^ onto the Ua-, ia- and 
ib-components is depicted in figure 12. The points of C^b satisfying ia = 0 and 
lb = 0 are the solutions of Af. They are determined in two steps. At first the set 
C C^^b of points with ib = 0 is computed via a contour plot of the well- 
known freely available program gnuplot. This set is the union of the solution sets 
G [0, 5]) of networks A/*a “ with only one inserted voltage source at branch 
a (see figure 11). The projection of onto the (ua, ia)-plane is also shown in 

figure 12. Prom this projection the parameters Va with ia = 0, i.e. the Uo-values 
corresponding to solutions of AT, can be read off. The corresponding Vb values are 
similarly determined. In this way one obtains the parameter values Va and Vb for 
the nine dc-operating points of the network Af. Figure 12 shows that a dc- analysis 
of the networks A/*a “ (see figure 11) with Xa as the sweep parameter will not succeed 
since cannot be parameterized by Xa. 

• The last example shows the application of the dpc-method to a somewhat larger 
transistor network. The circuit in figure 13 represents a simple operational amplifier 
which models the latch-up effect of the fiA 709 (see [2]). This amplifier is applied in 
network Af (see figure 14) as a voltage follower with input connected to ground by a 
10k resistor. For the computation of dc-operating points of this transistor network 
the load R has been replaced by a voltage source and a dc-analysis was carried out 
with the source voltage as the dc-sweep parameter. In the sense of the proof above 
this is interpreted as exchanging the load for a norator and then covering the v-i- 
relation of the resulting network with the u-z-relations of a family of networks where 
the norator is replaced with voltage sources. The io-component of the corresponding 
solution is shown as a bold curve in figure 15. The light straight line in figure 15 is 
the u-i-characteristic of a resistor with value 50. If the load resistance is larger than 
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50 then the circuit A/* has at least three dc-solutions, i.e. it exhibits the latch-up 
effect. 

Note that we did not break all feedback structures in network A/* by inserting 
the voltage source. Therefore the parameterization of the solution set by Vo 
is not necessarily unique and it is possible that there exist some other solution 
components of which have been skipped by the dc-analysis (cf. previous ex- 
ample). In particular, vertical line segments in a spice-computed dc-plot as they 
occur in figure 15 can indicate that the solution manifold consists of more than 
one component. It may be that such a line segment does not belong to the solution 
manifold but connects two separate components of In order to exclude 
this case, the characteristic in figure 15 has been verified with the help of the curve 
tracing algorithm by J. Haase which can also be implemented as a spice netlist. 
This last example shows that the dpc-method can be useful for detecting multiple 
dc-solutions even if well defined parameterization is not ensured. 



5 Conclusions 

With the help of homotopies, global search for all dc-operating points of a network 
becomes global search of a low-dimensional parameter space. Interesting projects for 
larger networks are (1st) an effective topological algorithm for the highly complex 
task to place a minimal number of independent sources needed to break all feedback 
structures, (2nd) an algorithm for the global dc-analysis basing on the proposed 
homotopy method that can compete with existing global algorithms. 

We thank the anonymous reviewers very much for their recommendations. 



References 

1. Kronenberg, L., Trajkovic, Lj., and Mathis, W.: Analysis of feedback structures 
and their effect on multiple dc-operating points. Proc. ECCTD’99^ Stresa, Italy, 
Aug. 1999, pp. 683-686. 

2. Widlar, R. J.: Design Techniques for Monolithic Operational Amplifiers. IEEE 
Journal of Solid-State Circuits, vol. SC-4, No. 4, August 1969 

3. Haase, J.: Computation of transfer characteristics of multivalued resistive non- 
linear networks. Proc. SSCT82, Part: Short Communications. (1982) 286-272 

4. Nielsen, R. O., Willson, A. N.: A Fundamental Result Concerning the Topology 
of Transistor Circuits with Multiple Equilibria. Proceedings of the IEEE on 
Circuits and Systems. 68 (1980) 196-208 




328 



Tobias Nahring, Albrecht Reibiger 



5. Ushida, A., Chua, L. O.: Tracing solution curves of non-linear equations with 
sharp turning points. International Journal of Circuit Theory and Applications. 
12 (1984) 1-22 

6. Willson, A. N.: The no-gain property for networks containing three-terminal 
elements. IEEE Trans. Circuits and Systems. CAS-22 (1975) 678-687 

7. Hasler, M., Neirynck, J.: Nonlinear Circuits. Artech House, inc. Norwood, 1986. 

8. Reibiger, A., Mathis, W., Nahring, T., Trajkovic, Lj., Kronenberg, L.: Mathe- 
matical Foundations of the TC-Method for Computing Multiple DC-Operating 
Points. XL ISTET’Ol preprints CD-ROM, Linz- Austria, 2001. 

9. Nahring, T., Reibiger, A.: Beitrage zur Arbeitspunktberechnung resistiver 
Netzwerke. Kleinheubacher Berichte, 45 (2001) 262-265 




Fast Calculation of Space Charge in Beam 
Line Tracking by Multigrid Techniques 



Gisela Poplau^*, Ursula van Rienen^, Marieke de Loos^, and Bas van der Geer^ 

^ Rostock University, D-18051 Rostock, Germany 

^ Eindhoven University of Technology, NL-5600 MB Eindhoven, The Netherlands 
^ Pulsar Physics, NL-3762 XA Soest, The Netherlands 

Dedicated to Professor Manfred Tasche on the occasion of his 60th 
birthday 



Abstract. Numerical prediction of charged particle dynamics in accelerators is 
essential for the design and understanding of these machines. The calculation of 
space charge forces influencing the behaviour of a particle bunch is still a bottleneck 
of existing tracking codes. 

We report on our development of a new 3D space-charge routine in the General 
Particle Tracer (GPT) code. It scales linearly with the number of particles in terms 
of CPU time, allowing over a million particles to be tracked on a normal PC. The 
model is based on a non-equidistant multigrid Poisson solver that is used to solve 
the electrostatic fields in the rest frame of the bunch. 

A reliable multigrid scheme for the tracking of particles should be very fast, 
stable and show good convergence for a great variety of meshes. Numerical results 
demonstrate the effect of the choice of the multigrid components. Further, the values 
of physical quantities show good agreement compared to the values calculated by 
a well-tested 2D routine in the GPT code. 



1 Introduction 

Nowadays, particle accelerators play an important role for scientific research as well 
as for medical and industrial applications. Demanding applications such as colliders 
and free electron lasers (FELs) require very high quality electron bunches, where 
any anomaly severely degrades the final performance. Hence, design and operation 
of accelerators require efficient numerical simulations. 

A powerful tool widely used for the study of the behaviour of charged beams 
is the General Particle Tracer (GPT) [3]. It calculates the trajectories of a large 
number of sample particles through the combined external and self-induced fields 
generated by the charged particles (the so-called space-charge forces). Depending on 
charge density and energy, a direct point-to-point model can not be used to calculate 
space-charge forces because of granularity problems and the inherent 0{N‘^) scaling 
between the number of sample particles and CPU time [2]. 
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A method to stabilize the calculations and to rigorously save CPU time is to 
restrict calculations to 2D, while assuming symmetry properties of the particle 
bunch. In this paper we introduce a 3D model for the fast calculation of space- 
charge. The algorithm is a further development of a method given in [9]. The main 
idea is that space-charge fields are computed in the rest frame of a particle bunch 
by a non-equidistant multigrid scheme. Hence, the numerical effort scales linearly 
with the number of particles in terms of CPU time. The new model is well suited 
for a variety of applications, especially for the handling of non-linear fields (see 
Figure 1). 




X [mm] 

Fig. 1. Density plot of the projection of the non-linear transverse electric field of a 
hard-edged bunch with a radius of 1 mm, a length of 0.1 mm and a charge of 1 nC, 
as calculated by the new 3D space-charge routine in CPT. The calculation is based 
on one million particles on a 129 x 129 x 129 mesh. 



The new 3D multigrid based space-charge routine enables the tracking of a 
million particles on a normal PC. Important questions for the efficiency of the 
algorithm are the construction of the grid adapted to the particle distribution and 
the choice of a reliable multigrid scheme. 



2 The 3D Space-Charge Model 

The particle tracking is performed by solving the relativistic equations of motion 
for a set of macro particles (sample particles) representing the distribution of the 
particles of a bunch. In the CPT code a 5^^ order embedded Runge-Kutta scheme 
with adaptive step size control is implemented for the numerical integration of these 
equations [2]. In each time step of the numerical integration the space-charge fields 
have to be taken into account. The space-charge calculation with the 3D model is 
performed as follows: 

1. Transformation of the particles from the laboratory frame to the rest frame 
by Lorentz transformation. The rest frame is here defined as frame where the 
average particle momentum is zero. 
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2. Determination of a non-equidistant 3D Cartesian grid in correspondence to the 
charge density of the bunch (see subsection 2.1). 

3. Approximation of the charge at the grid points. 

4. Calculation of the electrostatic potential at the grid points via Poisson’s equa- 
tion applying a multigrid algorithm. The finite difference scheme is used for 
the discretization of Poisson’s equation (see subsection 2.2). 

5. Derivation of the electric field in the rest frame and trilinear interpolation of 
the field values to the particle positions. 

6. Transformation of the field to the laboratory frame by Lorentz transformation. 

The efficiency and accuracy of the space-charge calculation mainly depends on the 
determination of the 3D mesh and the applied multigrid scheme to solve Poisson’s 
equation. We describe both in the next two subsections. 



2.1 Mesh Generation 

The electromagnetic potential is calculated on a 3D Cartesian mesh where an ap- 
proximation of the charge in the rest frame is stored at the grid points. The 3D mesh 
is generated in a box surrounding the bunch. To reduce the number of mesh lines 
needed, and thus to reduce CPU time, the density of the mesh lines is increased if 
the charge-density increases. 

The actual positioning of the mesh lines is an iterative process. The mesh lines 
are distributed so that they are spaced according to the distribution of the beam 
charge density. The parameter fn is introduced to maintain a maximum difference 
in spacing between neighboring mesh lines, to avoid the creation of a non-optimal 
mesh line distribution for the Poisson solver. If, e. g. fn = 0.25, then the difference 
in spacing between neighboring mesh lines cannot vary by more than 25%. To span 
the mesh over the bounding box additional mesh lines are added left and right of 
the bunch. The spacing of these additional mesh lines is increased, restricted to fn, 
to add as few lines as possible. 

The addition of mesh lines and the effect of fn is shown in Figure 2. When 
fn = 0, the spacing between all neighboring mesh lines is allowed to vary by 0%, 
creating an equidistant mesh. Such a mesh is most stable for the multigrid Poisson 
solver, but it will create many empty mesh boxes. On the other extreme, setting 
fn = 0.5 results in a dense sampling of the electron bunch and sparse sampling of 
the surrounding area. 



2.2 The Multigrid Poisson Solver 

After creating the mesh and approximating the charge of the particles on the mesh 
points, the space-charge forces can be calculated by means of Poisson’s equation 
given by 



-A(f = — in i? C 
^0 

Here, cp denotes the potential, g the charge density and eo the dielectric constant. 
The domain i? is a box surrounding the particle bunch. On the boundary Dirichlet 
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Fig. 2. Mesh line positions ((x, y)-plane) for a Gaussian charge density with /n = 0 
(top) and fn = 0.5 (bottom). The vertical axis shows the total charge in each mesh 
box, where the height of the top has been normalized in both plots. 



or open boundary conditions are assumed. First, Poisson’s equation is discretized by 
finite differences using the non-equidistant mesh described in the previous section. 
The solution of the resulting system of equations (with up to 1 million degrees of 
freedom) requires a fast and robust solver. 

State-of-the-art is the application of a multigrid method as Poisson solver. In 
model cases the numerical effort scales with the number of mesh points. The multi- 
grid algorithm operates on a certain number of grids starting with the mesh given 
by the discretization of Poisson’s equation. Then a sequence of coarser grids is 
generated by cutting mesh lines. On an equidistant mesh every second mesh line 
is removed. Now iteratively, a raw approximation of the solution of the systems 
of equations is obtained by the application of a few steps of a relaxation scheme 
(e. g. Gauss-Seidel) which is called pre-smoothing. This approximation is then im- 
proved by a correction vector obtained on the coarser grids (the so-called coarse 
grid correction) where restriction and interpolation work as grid transfer operators. 
After applying interpolation another few steps of relaxation are necessary (post- 
smoothing). For more details see [4,1]. 

As shown in [7,8] the coarsening strategy is crucial for the convergence of the 
multigrid algorithm on non-equidistant grids. The generation of coarse grids with 
every second grid line removed as suggested in [1] would lead to meshes with large 
aspect ratio. Hence the convergence of a multigrid scheme on such grids would 
considerably slow down. Here, the removal of mesh lines follows the rule: Two 
neighboring steps hi and /i 2 remain also in the next coarser grid as long as either 
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hi > shmin or h2 > shmin, where hmin denotes the overall minimal step size of 
the corresponding fine level. The factor s is chosen as 5 = 1.6 or s = 1.7 with the 
objective to obtain a decreasing aspect ratio of the mesh spacing. 

On the other hand the choice of the multigrid parameters such as number of 
pre- and post-smoothing steps^ the application of full or half restriction considerably 
influence the performance of the multigrid scheme. If the convergence of multigrid 
algorithm turns out to be not sufficient, multigrid can be applied as a preconditioner 
for the conjugate gradient algorithm. This method leads to a better convergence at 
least in cases where a plain multigrid scheme converges too slowly [6,5]. 



3 Numerical Examples 

3.1 Numerical Tests with the Multigrid Algorithm 

The multigrid method has been tested with a sphere filled with charged particles 
with Gaussian distribution. The space charge effect for only one time step has been 
computed. 




Fig. 3. Number of multigrid iterations for several multigrid schemes (MG: multi- 
grid, MG-PCG: multigrid preconditioned conjugate gradients). The calculations 
have been performed with a spherical bunch of 10,000 particles, where the parti- 
cles have a Gaussian distribution. The multigrid iteration has been performed to a 
relative residual less than 10“^^ in the maximum norm. 



In order to find out the best performance of the multigrid algorithm for the 
particle tracking several components of the scheme have been tested. The multigrid 
algorithm is performed as V-cycle. The red-black Gauss-Seidel relaxation has been 
taken as smoothing operator with 2 pre- and 2 post-smoothing steps (MG(2,2)). 
First, full restriction has been tested versus half restriction. Figs. 3 and 4 show that 
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the multigrid scheme performed with half restriction is very sensitive to certain 
grids, i. e. converges slowly. These are meshes where the sequence of coarse grids 
has no descending aspect ratio. 




Fig. 4. CPU time for several multigrid schemes (MG: multigrid, MG-PCG: multi- 
grid preconditioned CG) on a 800 MHz Pentium PC. The calculations have been 
performed with a spherical bunch of 10,000 particles, where the particles have a 
Gaussian distribution. 



The application of multigrid as preconditioner for the conjugate gradient method 
requires that the multigrid scheme is a symmetric and positive operator [5,6]. Thus 
the components have been chosen as follows: two pre-smoothing steps with red-black 
Gauss-Seidel relaxation, two post-smoothing steps with black-red Gauss-Seidel re- 
laxation and full restriction. Two V-cycles have been performed per GG-iteration 
step (MG-PCG(2,2)(2)). 



3.2 Tracking Example 

During a drift, an electron bunch expands both longitudinally and radially due to 
the space-charge forces. Figure 5 shows a simulation of a hard-edged bunch, starting 
with an energy corresponding to a Lorentz factor of 7 = 5. The bunch has a total 
charge of 1 nC, a radius of 1 mm and a length of 0.1 mm (’pancake’ bunch). After 
100 ps, the hard edges have become smooth, and the density at the head of the 
bunch is larger than at the tail due to relativistic effects. 

The expansion calculated with the new 3D space-charge routine has been com- 
pared to the expansion simulated with the well-tested cylindrically symmetric 2D 
space-charge routine of GPT [3]. The result for the bunch length is shown in Fig- 
ure 6. It demonstrates the perfect agreement between the 2D and the 3D routine, 
even with 1000 particles. 
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Fig. 5. Initial (left) and final (right) projections of the charge density of an ex- 
panding ’pancake’ bunch in the (a;, ^)-plane. One million particles are used on a 
65 X 65 X 65 mesh. 




Fig. 6. Bunch length (expressed as standard deviation) and final emittance after 
tracking a ’pancake’ bunch with total charge of 1 nC during 100 ps. Tracking a 
million particles with the 3D model takes only 30 minutes CPU time on a 1.6 GHz 
Pentium PC. 



Although the emittance, a measure of transverse focusability of the bunch, con- 
verges a little bit slower. Figure 6 shows already with 1000 particles good agreement 
with the 2D routine. The value with 10,000 particles is identical to the 2D case, 
where about 1000 ’rings’ are sufficient. It should be noted that this example is a 
quite extreme case where the emittance explodes from 0 to about 3 fim in just 
30 mm. 



4 Conclusions 



A new 3D space-charge routine implemented in the GPT code has been described 
in this paper. The new method allowing 3D simulations with a large number of 
particles on a common PC is based on a multigrid Poisson solver for the calculation 
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of the electrostatic potential in the rest frame. Numerical results of the 3D routine 
show perfect agreement with the standard 2D space-charge model of the GPT code. 
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Abstract. In electric circuits, signals often include widely separated frequencies. 
Thus numerical simulation demands a large amount of computational work, since 
the fastest rate restricts the integration step size. A multidimensional signal model 
yields an alternative approach, where each time scale is given its own variable. Con- 
sequently, underlying differential algebraic equations (DAEs) change into a PDAE 
model, the multirate partial differential algebraic equations (MPDAEs). A time do- 
main method to determine multiperiodic MPDAE solutions is presented. According 
discretisations rest upon the specific information transport in the MPDAE system 
along characteristic curves. In contrast, general time domain methods produce un- 
physical couplings. Hence enormous savings in computational time and memory 
arise in the linear algebra part. This technique is applied to driven oscillators in- 
cluding two periodic time scales as well as to oscillators, where one periodic rate is 
forced and the other is autonomous. 



1 Introduction 

In radio frequency applications, effects like filtering, mixing and frequency modu- 
lation produce signals with largely differing time scales. Hence transient analysis 
of the corresponding circuit’s equations becomes tedious, because the fastest rate 
limits the integration step size, whereas the slowest rate determines the total time 
interval. Numerical techniques, which compute directly the steady state response 
of a circuit, e.g. shooting methods or harmonic balance, are often time-consuming 
in view of this multirate behaviour, too. 

Alternatively, the introduction of independent variables for each time scale en- 
ables an efficient representation of multitone signals and thus creates a useful post- 
processing tool. Since a network approach yields differential algebraic equations 
(DAEs) to model a circuit [5], this strategy results in a PDAE description, the 
multirate partial differential algebraic equations (MPDAEs). Brachtendorf et al. 
[1] proposed this PDAE model and successfully applied it in frequency domain 
by a multidimensional generalisation of harmonic balance. In this connection, a 
multiperiodic MPDAE solution identifies the quasi-periodic response of a circuit. 
Therefore sophisticated numerical PDAE methods are required to solve the system 
efficiently. 

The arising time rates may be driven or autonomous. The presence of only 
autonomous scales makes the computation of multiperiodic MPDAE solutions dif- 
ficult, cf. [9]. However, the MPDAE model enables the efficient calculation of 
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envelope-modulated signals also in the purely autonomous case [2]. A mixture of 
driven and autonomous rates permits frequency modulation. Therefore the PDAE 
concept has been modified for this case by Narayan and Roychowdhury [12], which 
leads to the warped MPDAE model with stretched time scales. 

The MPDAE system implies an information transport along characteristic cur- 
ves, see [8]. This performance allows the construction of efficient time domain meth- 
ods, which is realised for driven oscillators in [10]. 

In this paper, we outline the design of numerical techniques in time domain 
using the structure of the characteristic curves. Sect. 2 gives a brief description of 
the MPDAE model. Then the characteristic system of the MPDAE is introduced, 
which yields an appropriate semi- discretisation scheme. In Sect. 4, we apply this 
approach to solve the MPDAE of a driven oscillator, namely a transistor modulator 
circuit. Finally, the method of characteristics is generalised to determine MPDAE 
solutions with a driven as well as an autonomous time scale, where a forced van 
der Pol oscillator demonstrates the interactions of the given time rates. 



2 Multivariate Functions and MPDAE Model 

Oscillatory signals with widely separated time scales arise in radio frequency appli- 
cations. Now the initiation of an own variable for each time scale yields an alterna- 
tive multidimensional model. A simple example is the following transition 



b{t) 6(ti, ^ 2 ) = sin(^ti) sin(^t 2 ) (Ti > T 2 ). (1) 



The new representation b is called the multivariate function (MVF) of the sig- 
nal b. Since the MVF is biperiodic here, we have just to consider the rectangle 
[0, Ti[x[0, T 2 [ in time domain. Note that only one sinusoid occurs per coordinate 
direction in this rectangle for any periods Ti,T 2 . Hence the MVF owns the advan- 
tage of being easy to sample. Still we can reconstruct the original signal completely 
from its MVF, because b{t) = b{t^ t) holds. 

In circuit simulation, Kirchhoff’s laws yield implicit systems of differential equa- 
tions, cf. [6], which we write in the form 



^q(x(<)) = f(b(«),x(<)), 



( 2 ) 



where x(t) G denotes unknown time-dependent node voltages or branch currents 
and q G represents a charge term. In general, the Jacobian matrix of q will be 
singular, which makes (2) a differential algebraic equation {DAE). Such systems 
raise theoretical and numerical problems like the index concept or the demand of 
consistent initial values [4]. The right-hand side f G R^ depends on input signals 
h{t) G R^. Assuming the said multirate behaviour for b, we introduce corresponding 
MVFs b G R^ . In the following, we restrict to the most frequent case of two widely 
separated time scales. Extensions to more rates are straightforward. The introduc- 
tion of multidimensional functions transforms the DAE model (2) of the circuit into 
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the above mentioned PDAE formulation, the multirate partial differential algebraic 
equation (MPDAE) 



dq(x{ti,t 2 )) 9q(x(fi,f2)) 
dti dt2 






( 3 ) 



with the MVF x of x. It is easy to show that a solution x of (3) produces a solution 
of (2) via x(t) = x(t,t). Thereby, a biperiodic MPDAE solution yields a two-tone 
quasi-periodic DAE solution, i.e. a function of the form 



+00 

x(i)= exp[*27r(^ + ^)<] with (4) 

. ^ -t2 

Jl ,j2 = -00 

Sufficient and necessary existence theorems for multiperiodic MPDAE solutions 
in connection with quasi-periodic DAE solutions can be found in [11]. Therefore 
the MPDAE model provides an alternative approach for the calculation of quasi- 
periodic signals in radio frequency applications. Although a PDAE has to be solved 
now, the simple behaviour of MVFs causes savings in computational time, if efficient 
numerical techniques are used. 



3 Characteristic System of MPDAE 

A detailed analysis of the system (3) exhibits a hyperbolic structure, cf. [8]. Accord- 
ingly, information transport proceeds along characteristic curves, which are just the 
straight lines in diagonal direction, i.e. (r + c, r) with a parameter r G R for any 
fixed c G M. Consequently, we define a characteristic system of (3) by 



^q(xc(r)) =f(b(r + c,r),Xc(r)), 



( 5 ) 



which represents a family of DAEs for the restrictions Xc(r) := x(r -f- c, r). Now 
a function x is a solution of (3), if and only if the restrictions Xc(r) satisfy the 
characteristic system (5) for all c G M, cf. [10]. 

Our aim is to compute numerical approximations of a biperiodic MPDAE so- 
lution with periods Ti and T 2 , without loss of generality Ti > T 2 . To obtain an 
appropriate semi-discretisation of the MPDAE (3), we choose a finite number of 
DAEs from the characteristic system (5). Owing to the periodicity condition in the 
first coordinate direction, we consider an initial manifold on the line [0, Ti [x {0} and 
select the characteristic curves with c = {j — l)hi; hi —Tijni for j = 1, . . . , m. 
Fig. 1 illustrates this strategy. Hence a system of n\ independent DAEs 

^q(xj(r)) = f(b(T + 0'-l)/ii,r),Xj(T)) i = (6) 



arises in this semi-discretisation. Furthermore, the periodicity condition in the sec- 
ond coordinate direction yields boundary conditions for the system (6). For the 




340 R. Pulch 




Fig. 1. Information transport for MPDAE system. 



initial values Xj(0), we have the requirement Xj(0) = x((j — l)hi,T 2 ). The corre- 
sponding point {{j — 1 )/ii,T 2 ) does not necessarily lie on the selected characteristic 
curves. Nevertheless, we are able to interpolate the value in this point by the quan- 
tities Xj(T 2 ) on the line t 2 = T 2 . Thus the boundary condition results in 

(xi (0), . . . , x„(0))^ = B(xi(T 2 ), . . . , x„(T 2 ))^ (7) 

with a constant matrix B G whose entries depend on the used inter- 

polation formula. Hence the constants Ti,T 2 and hi also influence B. Moreover, 
polynomial interpolation of a low degree creates a sparse matrix B. 

By solving the boundary value problem (6), (7) for DAEs, we obtain an approxi- 
mation to a biperiodic solution of the MPDAE (3) in the underlying parallelogram, 
see Fig. 1. Using the periodicities, the solution can be interpolated everywhere. 

4 Application to Driven Time Scales 

In this section, we assume an oscillator, which is driven by input signals with two 
separate time scales. If a corresponding two-tone quasi-periodic solution x(t) of (2) 
exists, then we know its rates Ti, T 2 by the frequencies of the input signals in h(t). 
Hence we choose appropriate MVFs b(U, ^ 2 ) and apply the MPDAE model (3) with 
biperiodic boundary conditions for the given periods T\ and T 2 . 

Consequently, a numerical solution can be computed via the boundary value 
problem of DAEs (6), (7). As test example, we consider a transistor modulator 
circuit. Fig. 2 displays its circuit diagram. Using nodal analysis and the Ebers-Moll 
model for the transistors, a system of four differential equations 

+ *2 + /s [exp(A(ui - Uopi)) - 1] - Otis [exp(Au4) - 1] 

-L^ = -«i 

0 = -Is [exp(A(u4 + Uin 2 )) - 1] + Oils [exp(A(Uin 2 - Uopl)) - 1] 

—Is [exp(Au4) — 1] + Oils [exp(A(ui — Uopi)) — 1] 

—Is [exp(A(u4 + Uini — Uop 2 )) — 1] + Oils [exp(Au3) — 1] 

0 = —Is [exp(AU3) — 1] + Oils [exp(A(u4 + Uinl — Uop 2 )) — 1] 

+ [Uopi — Uop2 + Uinl ~ Us] 



( 8 ) 




Numerical Techniques for Solving MPDAEs 



341 




Fig. 2. Circuit diagram of transistor modulator. 



emerges, which is a DAE of index 1. More details of the derivation and the technical 
parameters can be found in [3]. We choose input signals Uini{t) = 4V cos(27rt/Ti) 
and Uin 2 (t) = O.IV cos(27t^/T 2) with the slightly differing periods Ti = 0.1msec 
and T 2 = 0.01msec in order to obtain a detailed visualisation of the whole signals. 
A finite difference method is suitable to solve the boundary value problem (6), (7), 
since it can be easily applied to DAEs. We use a discretisation with centred differ- 
ences and linear interpolation in the boundary conditions. Fig. 3 and Fig. 4 show 
the resulting MPDAE solutions for the output voltage ui as well as for the node 
voltage U 4 and the corresponding DAE solutions, which are interpolated from the 
MVFs. For comparison, we solve an initial value problem of the DAE system (8) 
using the RADAU5 integrator. Indeed, an excellent agreement of both solutions 
can be observed in Fig. 3 and Fig. 4, respectively. Simulations for widely separated 
periods Ti ^ T 2 produce the same shape for the MPDAE solutions. 

A finite difference method, which is directly applied to the MPDAE (3), gener- 
ates unphysical couplings and thus increases the computational effort. On the other 
hand, a solution of the problem (6), (7) leads to enormous savings with respect to 
computational time and memory in the linear algebra part, since the technique 
regards the information transport in the system (3). This gain of efficiency is inves- 
tigated in [10], where the ring modulator circuit serves as test example, i.e. a DAE 
of index 2 consisting of fifteen equations. 



5 Application to Autonomous Time Scales 



Now we consider an autonomous oscillator with a periodic limit cycle. If an indepen- 
dent input signal with another period is added, then a solution with two separate 
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Fig. 3. MPDAE solution for output voltage u\ (left) and corresponding DAE solu- 
tion (right, x) together with integrated DAE solution (right, — ) from RADAU5. 



time scales may arise. As benchmark, we observe the equation of a forced van der 
Pol oscillator 



drx 2 






AsiniY^t), 



( 9 ) 



which represents an ordinary differential equation (ODE). For the autonomous 
oscillator (A = 0), a limit cycle with a priori unknown period To exists. One can 
apply time or frequency domain techniques to calculate this solution and its period, 
which is described in [7], for example. The forcing term on the right-hand side 
(A ^ 0) introduces another time rate Ti, which leads to a two-tone solution of (9) 
with a second time scale T 2 . Let us assume k = 2 tz and Ti ^ T 2 , then the behaviour 
of the fast rate T 2 of the new solution depends on the parameter /i > 0. On the one 
hand, for small values, e.g. jj, < 0.1, it holds T 2 « To 1. On the other hand, if 
fjL is large, frequency modulation occurs and the rate T 2 is not constant any more. 
Accordingly, we call T 2 ~^ the local frequency. 
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Fig. 4. MPDAE solution for voltage ua (left) and corresponding DAE solution 
(right, x) together with integrated DAE solution (right, — ) from RADAU5. 



The MPDAE model has been adapted to the case of frequency modulation in 
[12]. For a DAE (2) with a single-tone input signal b, the warped multirate partial 
differential algebraic equation ( WaMPDAE) is defined by 



9q(x(fi,<2)) , . aq(x(ti,<2)) 

■ — — df2 — 



f(b(ii),x(ii,i2)), 



where z/ is the local frequency. Therefore the function 



x(t) = x(t,^(t)) with 




iy(ti)dti 



( 10 ) 



( 11 ) 



satisfies the DAE (2). A (Ti , l)-periodic solution x gives us a two-tone solution 
x(t) with local frequency z^(t). The Ti-periodic function ly(ti) is unknown a priori 
and thus (10) is underdetermined. Hence we add a suitable phase condition at the 
boundary like ^^(ti, 0) = 0 for the first component and all ti. The necessity of a 
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phase condition is closely related to the fact that for a solution x(ti,t2) of (10) also 
x(ti,t2 +7) satisfies the WaMPDAE for all 7 G M, since the system is autonomous 
in ^2- Therefore the solution of (10) yields more information than a special two-tone 
solution of (2), because we can reconstruct a continuum of DAE solutions by just 
one WaMPDAE solution. 

In contrast to the original MPDAE (3), the characteristic curves are no longer 
straight lines. They are given by the relation t 2 = ^{ti) + c with c G M and 
thus > 0 implies monotonic functions. If u in (10) is constant, then the 

characteristic curves are again straight lines with gradient z/ and the WaMPDAE 
(10) can be transformed into the original MPDAE (3). 

We apply the WaMPDAE model to the equivalent first order system of the 
forced van der Pol oscillator (9) with the parameters /i = 10, k = 27t, A = 30 and 
Ti = 1000 (dimensions are omitted). Since (9) is an ODE, we call (10) a partial 
differential equation (PDE) now. The resulting WaMPDE can be directly solved by 
a finite difference method with centred differences on a uniform grid, for example. 
Although this discretisation does not regard the information transport and thus 
increases the computational effort, it allows an easy implementation and yields a 
robust scheme. Thereby, we calculate a biperiodic solution together with a local 
frequency function for the above mentioned phase condition. Fig. 5 illustrates the 
computed solution. Obviously, the local frequency is not constant here and exhibits 
the mean value < T2 1.25 with variations about 10%. 

Nevertheless, we are able to tailor the strategy from Sect. 3 to the case of 
constant frequency = 1/2, which arises for small parameters n in the forced 

van der Pol oscillator (9). As mentioned above, a linear transformation changes 
the WaMPDAE (10) into the original MPDAE (3) here. Hence we encounter again 
the boundary value problem (6), (7) of DAEs, which has proved suitable for purely 
forced oscillators. In contrast to this case, the period T 2 = y 2 ^ unknown in view 

of the autonomous second time scale. However, the numerical solution of (6), (7) 
together with the time rate T2 is similar to the computation of a periodic limit 
cycle of an autonomous ODE. Following [7], we add T 2 to the list of unknowns in 
a numerical method. Furthermore, a scalar phase condition has to be fixed as an 
extra equation in order to extract a unique function from the continuum of MP- 
DAE solutions. Solving this extended boundary value problem yields a numerical 
approximation for the biperiodic MPDAE solution and its second period. 

Accordingly, we apply this approach to the forced van der Pol oscillator with 
jj, = 0.1. Since Eq. (9) represents an ODE, the resulting system (6) is also an ODE 
system and the arising boundary value problem can easily be solved by means of a 
shooting method. The periodic limit cycle of the completely autonomous oscillator, 
i.e. A = 0 in (9), serves as initial values in the corresponding Newton iteration. 
We use the trapezoidal rule in the forward integration of the ODE system. The 
Newton iteration yields T 2 = 1.0014 for the second time scale. Fig. 6 shows the 
result for the biperiodic MPDE solution x. In comparison to the case = 10 
in Fig. 5, we observe a greater extend of amplitude modulation and less steeper 
gradients. The first few cycles of the corresponding ODE solution are also illustrated 
in Fig. 6, where interpolated values from the MVF agree with the integrated solution 
using trapezoidal rule. During further cycles, a phase shift between the exact ODE 
solution and the values from the MPDE solution emerges, because the fast rate T 2 
includes a numerical error here. Nevertheless, the MPDE model captures completely 
the qualitative behaviour of the quasi-periodic signal. 
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Fig, 5. Local frequency v (left) and biperiodic solution x (right) of WaMPDE for 
the forced van der Pol oscillator with = 10. 
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Fig. 6. Biperiodic solution x (left) of MPDE for the forced van der Pol oscillator 
with /i = 0.1 and corresponding ODE solution (right, x) together with integrated 
ODE solution (right, — ) using trapezoidal rule. 
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6 Conclusions 

The multivariate signal model demands sophisticated techniques to solve the re- 
spective partial differential algebraic equations efficiently. An approach has been 
developed in time domain, which employs the specific structure of corresponding 
characteristic curves. Accordingly, significant savings in computational time and 
memory arise in comparison to common discretisation schemes. Test results con- 
firm that this method is suitable for simulating driven oscillators. Furthermore, 
the method of characteristics can be applied to driven autonomous oscillators with 
quasi-periodic signals. Again numerical calculations evidence a good behaviour. 
These promising results suggest to tailor the technique to frequency modulated 
signals. 
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Abstract. The application of Jacobi-Davidson style methods in electric circuit 
simulation will be discussed in comparison with other iterative methods (Arnoldi) 
and direct methods {QR^ QZ). Numerical results show that the use of a precondi- 
tioner to solve the correction equation may improve the Jacobi-Davidson process, 
but may also cause computational and stability problems when solving the correc- 
tion equation. Furthermore, some techniques to improve the stability and accuracy 
of the process will be given. 



Introduction 

Pole-zero analysis is used in electrical engineering to analyse the stability of electric 
circuits [10,11,14]. For example, if a circuit is designed to be an oscillator, pole-zero 
analysis is one of the ways to verify that the circuit indeed oscillates. Another 
application is the verification of reduced order models over a wide frequency range 
[9]. Because the complexity of the circuits designed nowadays grows, together with 
the frequency range of interest, there is need for faster algorithms, not neglecting the 
accuracy. In this paper. Sect. 6 introduces the pole-zero problem. Section 6 describes 
the Jacobi-Davidson style methods as alternatives of conventional methods for the 
pole-zero problem. In Sect. 6, some typical numerical problems and techniques are 
discussed. In Sect. 6 the methods will be compared by numerical results, concluding 
with some future research topics. 



Pole-zero Analysis in Circuit Simulation 

The Kirchhoff Current Law and the Kirchhoff Voltage Law describe the topology of 
an electric circuit. Together with the Branch Constitutive Relations, which reflect 
the electrical properties of the branches, the two Kirchhoff Laws result in a system 
of differential algebraic equations [10]: 

^q(«,x) +j(i,x) = 0, (1) 
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where x G contains the circuit state and q, j : R x — >• R are functions rep- 

resenting the reactive and resistive behaviour, respectively. The way (1) is solved 
depends on the kind of analysis (DC-analysis, AC-analysis, transient analysis, pole- 
zero analysis). In every analysis, the capacitance matrix C G R^^’^ and the con- 
ductance matrix G G R^^^ appear: 



C{t,yi) 






G(<,x) 



ax 



Both matrices are real, non-symmetric and sparse. 

Starting from a linearization round the DC-operating point, the time-domain 
formulation is as follows: 



fC^ + GxW = eW 

\x(0)=0, 



where e(t) models the excitation. Because not all properties can be computed in 
the time domain, the problem is transformed to the frequency domain by applying 
a Laplace transform: 



{sC + G)X{s) = S[s), 



( 3 ) 



where T, S are the Laplace-transforms of the variables x, e and s is the variable in 
the frequency domain. The response of the circuit to a variation of the excitation 
is given by the transfer function 

U{s) = {sC + G)-\ (4) 



The elementary response of circuit variable Xo to excitation Si is given by 

'Hoi{s) = ^l{sC + G)-^Bi. (5) 

The poles are the values pk ^ C that satisfy At{pkC-hG) = 0, hence (GXpkG)x = 0 
for some x 0, which leads to a generalised eigenproblem (A = —pk)- 

Gi^ ^ ACx, X 0. (6) 



Because the problem of computing the zeroes is similar to the problem of computing 
the poles, the rest of this paper will consider the problem of computing the poles. 

Especially for large circuits (n > 10^), robust, iterative methods for the gener- 
alised eigenvalue problem (6) with sufficient accuracy and acceptable computational 
costs are needed. Furthermore, all right half-plane poles and no false right half-plane 




Jacobi-Davidson Methods in Pole-zero Analysis 351 



poles are desired. The dominant poles and zeroes must be accurate enough to pro- 
duce correct Bode-plots for the frequency range of interest. 

Two kinds of pole-zero methods are known in literature [10]: combined and 
separate pole-zero computation. This paper focuses on separate pole-zero compu- 
tation. 



Jacobi-Davidson Style Methods 

Because of its accuracy and robustness, the full-space Qi^-method (and the QZ- 
method to a less degree) is a popular choice as solver for the eigenproblem (6). 
However, the total costs of O(n^) and the necessity of an Lf/-decomposition, which 
destroys the sparsity of G and causes numerical inaccuracies and maybe instabili- 
ties, become unacceptable for larger problems. Iterative methods like the implicitly 
restarted Arnold! method also need the LC/-decomposition and are designed to 
compute only a few (m <C n) eigenvalues [11]. 

The Jacobi-Davidson method [13], on the other hand, is designed to converge 
fast to a few selected eigenvalues. Based on the Jacobi-Davidson method, the JDQR- 
method [8], which computes a partial Schur form, and the JDQZ-method [8], which 
computes a partial generalised Schur form, are developed. Without going into much 
detail, the basic idea behind the Jacobi-Davidson methods is as follows. For the 
problem Ax = Ax, given the eigenpair approximation (?9fc,Ufc): 

— Search a correction v G for such that 

A(ufc + v) = A(ufc + v). 

— Solve V from the correction equation, with = Auk — 

(/ - UkU*k){A - 'dkl){l - UfcUfc)v = -Ffc. 

— Orthogonally expand the current basis V with v. 

The Ritz- vector = Fs is obtained by applying a full-space method, for instance 

the QR-method, to the projected matrix V*AV, resulting in the eigenpair 
The Jacobi-Davidson method satisfies a Ritz-Galerkin condition [13]. 

The correction equation needs more attention. For the JDQR-method, it is 



(/ - QQ*)(A - ^kl){l - QQ*)V - -r^, (7) 

where Q G R” ^ ^ . If the correction equation is solved exactly, the convergence of the 
Jacobi-Davidson method is quadratic [13]. Besides solving the correction equation 
exactly, one can use linear iterative methods, like GMRES, with or without precon- 
ditioning. Because exact solvers are often not feasible in practice, the focus is on iter- 
ative methods with preconditioning. Using a preconditioner, however, is not as easy 
as it seems. Consider a preconditioner K ^ A — 'dkl- There are three major issues. 
Firstly, the preconditioner is projected afterwards {K = {I — QQ*)K{I — QQ*)). 
Secondly, A — dk I becomes more and more ill conditioned as the approximations 
-dk become near the eigenvalue A. Thirdly, this dk changes every iteration, and so 
does A — 'dkl’ 




352 J. Rommes et al. 



Numerical Problems and Techniques 



In [5], a technique is described to reduce the problem size of the ordinary eigen- 
problem. The idea is to remove the columns (and corresponding rows) from G~^C 
which are equal to zero, as well as the rows (and corresponding columns) equal to 
zero. This is justified because rows and columns equal to zero have corresponding 
eigenvalues of value zero, and removing these rows and columns does not influ- 
ence the other eigenvalues. Because the product G~^C is not available explicitly, 
for computational reasons, the k rows and columns to keep are administrated in a 
matrix S = [ei-^ , . . . , e^^], where Cj is the ji-th unit vector of length n. The reduced 
matrix is then defined by G~^CS. 

Reduction of the problem in this way has a number of advantages. Firstly, 
Jordan blocks may be removed, thereby improving the stability and accuracy of 
the computations. Secondly, the computational costs will be reduced. Table 1 shows 
the degree of reduction for some example problems. 



Table 1. The size, reduced size and degree of reduction for some example 
problems. The data is extracted from [5] and [10]. 



Problem 


Size 


Size (reduced) 


Reduction 


pz_09 


504 


365 


28% 


pz_28 


177 


74 


58% 


pz_36_osc 


[120 


86 


28% 


pz_agc_crt 


114 


96 


16% 


jr_l 1 


|815 


681 


16% 



Note, however, that the spectral properties of the reduced problem do not differ 
from the spectral properties of the original problem. As a consequence, the speed 
of convergence of the eigenmethod used will not be improved significantly. This 
technique is not applicable directly to the generalised eigenproblem. 

Considering the computation of the preconditioner (for the operator in the 
correction equation), one has to cope with two problems, i.e. the near singularity 
of the operator to precondition and the continuous change of the operator. These 
two problems appear in both Jacobi-Davidson QR and Jacobi-Davidson QZ. For 
computational reasons, a new preconditioner should be computed at most once 
for each new eigenvalue target. This is possible in practice, because the search 
space and corresponding Ritz-values in general contain good approximations for 
new eigenvalues. Some additional advantage can be gained by only recomputing the 
preconditioner for changes in larger than a certain threshold. Singularity problems 
can be dealt with by replacing zero diagonal elements by a small threshold value, a 
common technique for incomplete L[/-decompositions. The costs and fill-in can be 
controlled by using a drop-tolerance for non-diagonal elements, resulting in ILUT 
[12] decompositions. 
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Numerical Results and Conclusions 

The data for the test problems was generated by the in-house analog electric circuit 
simulator Pstar of Philips Research [14]. Both full-space and iterative methods have 
been used to solve the ordinary and generalised eigenproblem. A small selection of 
the results presented in [10] has been made to identify the problems which are typ- 
ical for the different approaches. Implementations of the Jacobi-Davidson methods 
are based on the algorithms in [4] and are available on the Internet: refer to [1] for 
JDQR and to [2] for JDQZ. Experiments have been done in Matlab 5.3 [3]. 

The transformation of the generalised eigenproblem to the ordinary eigenprob- 
lem may introduce inaccuracies, as has been mentioned before. Bode-plot (a) in 
Fig. 1 shows an example of this. Before computing the eigenvalues, the problem 
has been reduced from size 30 x 30 to 12 x 12 with the method described in [5]. The 
solution computed by QR differs significantly on two points from the exact solu- 
tion, which is computed by using (5) for several frequencies s. The two notches are 
caused by non-cancelling poles and zeroes, which do cancel in the original problem. 
It is conceivable that this is caused by the inversion of G. The iterative methods 
Arnoldi and JDQR suffer even more from inaccuracies. Bode-plot (b) in Fig. 1 
shows the computed solutions for the generalised eigenproblem. In this case, the 
QZ-method nearly resembles the exact solution, while the iterative schemes still 
suffer from inaccuracies. The fact that QZ performs better than QR, while both 
methods in theory compute the same eigenvalues, strengthens the argument that 
the inversion of G introduces critical inaccuracies. A general remark can be made 




Fig. 1. (a) Bode-plot computed from the ordinary eigenproblem; (b) Bode-plot 
computed from the generalised eigenproblem. 



about the interpretation of Bode-plots. It is not clear how accurate the original data 
of the circuit is. As a consequence, one may argue that the resulting Bode-plots 
are only representative up to a certain frequency, depending on the accuracy of 
the original data. For applications in the RF-area, this issue and the accuracy of 
eigenvalues near zero play even a more important role. 

Using preconditioners when solving the correction equation of the JDQR method 
does indeed improve the speed of convergence, as can be seen in Fig. 2, where graph 
(a) shows the convergence history when using GMRES as solver, and graph (b) 
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when using GMRES with an ILUT preconditioner (t == 10 ®). The quality of the 




The test subspace W is computed as 




Fig. 2. (a) Convergence history for JDQR with GMRES; (b) Convergence history 
for JDQR with ILUT-preconditioned GMRES {t = 10“^). A convergence history 
plots the residual against the Jacobi-Davidson iteration number; each drop below 
the tolerance means an accepted eigenvalue. 



improvement strongly depends on the accuracy of the preconditioner. When using 
an ILUT preconditioner, a drop-tolerance of maximal t = 10~® is acceptable. This 
shows also one of the difficulties: the preconditioner has to be rather accurate, and 
in the case of ILU based preconditioners this means in general high costs. Apart 
from that, the ILU based preconditioners experience problems for singular matrices, 
and the matrix A — dkl becomes more and more singular. This last problem has 
appeared to be more severe for the JDQZ method. The fact that the preconditioner 
is projected afterwards has not a significant influence on the quality. 

The reduction technique for the ordinary eigenproblem does lead to lower com- 
putational costs for both the direct and iterative methods. However, the gain de- 
pends on the degree of reduction, and is more pronounced for the iterative methods, 
because of the dominating matrix-vector products. The computational gained var- 
ied from 15% for the QR-method to 30% for the iterative methods, with top gains 
of 50%. Also, the condition number of the problem is improved, and if Jordan 
blocks are removed, the computed eigenvalues are more accurate. The number of 
iterations needed to computed the eigenvalues is not decreased, as expected. For 
more information and result of this reduction technique, refer to [5,11]. 

The observations launch ideas for future work. One can think of efficient updates 
for preconditioners [6], model reduction techniques and combinations of Jacobi- 
Davidson with other iterative methods like Arnoldi or combined pole-zero methods, 
such as Fade via Lanczos [7]. 



References 

1 . http: / / www.math.uu.nl/people/sleijpen / JD_software/ JDQR.html. 

2. http://www.math.uu.nl/people/sleijpen/JD_software/JDQZ.html. 

3. http: / / www.mathworks.com. 




Jacobi-Davidson Methods in Pole-zero Analysis 355 



4. Bai, Z., Demmel, J., Dongarra, J., Ruhe, A., and van der Vorst, H., 
Eds. Templates for the Solution of Algebraic Eigenvalue Problems: a Practical 
Guide. SIAM, 2000. 

5. Bomhof, C. Jacobi-Davidson methods for eigenvalue problems in pole zero 
analysis. Nat. Lab. Unclassified Report 012/97, Philips Electronics NV, 1997. 

6. Bomhof, C. Iterative and parallel methods for linear systems, with applications 
in circuit simulation. PhD thesis, Utrecht University, 2001. 

7. Feldmann, P., and Freund, R. W. Efficient linear circuit analysis by Pade 
approximation via the Lanczos process. IEEE Trans. CAD If (1995), 639-649. 

8. Fokkema, D. R., Sleijpen, G. L., and van der Vorst, H. A. Jacobi- 
Davidson style QR and QZ algorithms for the reduction of matrix pencils. 
SIAM J. Sc. Comp. 20, 1 (1998), 94-125. 

9. Heres, P. j., and Schilders, W. H. Reduced order modelling of RLC- 
networks using an SVD-Laguerre based method. In SCEE 2002 Conference 
Proceedings (2002). 

10. Rommes, j. Jacobi-Davidson methods and preconditioning with applications 
in pole-zero analysis. Master’s thesis, Utrecht University, 2002. 

11. Rommes, J., van der Vorst, H. A., and ter maten, E. J. W. Jacobi- 
Davidson Methods and Preconditioning with Applications in Pole-zero Analy- 
sis. In Progress in Industrial Mathematics at ECMI 2002 (2002). 

12. Saad, Y. Iterative methods for sparse linear systems. PWS Publishing Com- 
pany, 1996. 

13. Sleijpen, G. L., and van der Vorst, H. A. A Jacobi-Davidson Iteration 
Method for Linear Eigenvalue Problems. SIAM Review f2,2 (2000), 267-293. 

14. TER Maten, E. J. W. Numerical methods for frequency domain analysis of 
electronic circuits. Surv. Maths. Ind., 8 (1998), 171-185. 




The Electro-Quasistatic Model in Different 
Applications 



Ute Schreiber"^, Jurgen Flehr**, Victor Motrescu* * *, and Ursula van Rienen 
Institute of General Electrical Engineering, Rostock University, Germany 



Abstract. An electromagnetic field can be considered as slowly varying if the 
wavelength is large compared to the problem region. In the electro-quasistatic case 
it then may be assumed that the time-derivative of the magnetic flux is negligi- 
ble, whereas the displacement currents have to be taken into account. Under these 
assumptions Maxwell’s equations for time harmonic fields reduce to a complex Pois- 
son’s equation and discretization yields a complex symmetric system of equations. 
Krylov-subspace methods with an algebraic multigrid (AMG) preconditioner are 
used for fast solution. The electro-quasistatic model is applicable in many different 
constellations. This paper deals with applications from three different fields: high- 
voltage engineering, neural sensor-actor systems and the influence of slowly varying 
fields on human tissue. 



Introduction 

An electromagnetic field can be considered as slowly varying if the wavelength is 
large compared to the problem region which means 



|/?R| <C 1 with the wavenumber k = u;^fie{l — i — ) 

and some characteristic dimension R of the studied system [1]. For our time- 
harmonic applications we obtain the estimates \kR\ ^ 2 • 10~® for the insulator 
problem and \kR\ ^ 0.0037 for 50 Hz fields in the human body. The spatial wave- 
length is given by \l/k\. 

For a predominantly electric field, dB/dt « 0, diy/dt / 0 may be assumed 
in case of |A:jR| 1. Regarding e.g. time harmonic electro-quasistatic fields in 
anisotropic media these assumptions yield the following set of simplified Maxwell’s 
equations 



curlE « 0, (1) ^ (3^ 

curlH = iwD + + Jy , (2) divB = 0. (4) 

Therein, H and E denote the magnetic and electric field strength, D == eE the 

electric field density, B = /xH the magnetic induction, J — J/ oE -f Jgradp the 
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current density composed of the impressed current density J/, the conduction cur- 
rent density aE and the convection current density with the charge carrier density 
p and the diffusion constant d. Here, we have used the common representation 
E(r, t) = Re(E(r)e^'^*) with the complex amplitude E(r) == E(r)e^^. The non- 
linear, time-dependent rank two tensors a, e and p are assumed to be piecewise 

constant functions with a; > 0, e > 0 and p > 0 {iy = p~^) if not stated differently. 

We assume that appropriate boundary and interface conditions are defined. 
According to (1) the electric phasor E is curl-free and thus may be described as the 
gradient of a scalar potential. Note that this is a complex potential: E = — gradv?. 
Under these conditions and from (1) - (4) we get the complex divergence equation 
for the time-harmonic EQS potential 



div [{iuoe + q) grad = div (Jj) . (5) 

With the Finite Integration Technique (FIT) [15] the continuous equation (5) 
is transformed into the discretisized one 

S(ia;M, + (6) 



with the divergence operator S, the material matrices Me, Mo- and the gradient 
operator G [13]. The system matrix (6) is a complex symmetric almost 

singular matrix with seven bands. The large condition number mainly results from 
large differences in the material parameters. 

The electro-quasistatic model has been implemented in the software package 
MAFIA [10], which is based on FIT, on a Cartesian grid. MAFIA is used for ge- 
ometric modeling^ creation of the complex symmetric system of equations and 
post-processing of the examples shown in this paper. 



Application Fields for the EQS Model 

Electric Field on Contaminated High-Voltage Insulators 

High-voltage insulators are stressed by the applied electric field cis well as by other 
environmental factors. As a result, the surface of the insulating material gets aged 
and the dielectric material looses its hydrophobic and insulating characteristics. 
Contamination of the object with water droplets accelerates the aging process. 
Experimental investigations have shown that with the increase of applied voltage, 
droplets vibrate first, they are then extended to the direction of the applied electric 
field and finally flash-over bridging water droplets occurs. To improve the under- 
standing of the aging phenomenon it seems advisable to observe single droplets on 
an insulating surface. The shape of the droplets supplies more information about 
the status of the insulating material [8]. In addition to the experiments [9] the 
simulation of the electric field strength near the water droplets is necessary. It 

^ The human body is discretized with CST MicrowaveStudio^^ [10]. 
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SYMBOL: AES_E 

MIN. VALUE: 0.0 kV/mm 

MAX. VALUE: 1.58 kV/mm 



Fig. 1. Magnitude of electric field 
strength on the epoxy resin sample. 



J in mA/cm ^ 




Fig. 2. Current jKiU), i = 

l,...,n) calculated in MAFIA for n = 6 
and by (8) for large n (single dots). 



allows to calculate the electric forces on the droplet surfaces and thus to find a 
correlation between the shapes and the droplet movement [14]. For experimental 
investigations of droplet movements it is necessary to eliminate other parameters 
which influence the distribution of field strength on the insulating surface. 

This is why simplified test specimen (blocks of epoxy resin) are used for ex- 
periments [9] and simulations. The considered high-voltage devices are driven with 
50 Hz a.c. voltage. The epoxy resin has a relative permittivity of £:r = 4 and a con- 
ductivity of (7 = 10“^^ S/m. The water drops have a relative permittivity of £r = 81 
and a conductivity of cr = 10“^ S/m. The permittivity of the air surrounding the 
structure is Sr = 1.000576. A voltage of 15 kV is used. Figure 1 shows the absolute 
value of the electric field strength on an insulating surface with two water droplets. 
The position of the latter is clearly visible by the augmented field strength at the 
’’triple points” with air, water and insulating material; the maxima are in the di- 
rection of applied voltage. From the electric field the force density can be computed 
[14]. The final goal is to simulate the droplet movements in a coupled calculation. 

Electric Field of Living Neurons and the Neuron-Electrode 
Interface 

Another application of the EQS model is the simulation of the electric field of living 
neurons and the neuron-electrode interface. The possibility to cultivate a living 
neural network on a microchip opens new chances in the field of neurophysics. With 
a living neural network on a chip it is possible to capture the signals of neurons, 
e.g. the action potential of a single neuron. 

An action potential is an electrical signal of excitable cells. Such signals can 
propagate along the axon of the nerve. Hence the action potential is essential for 
signal propagation. The present study is part of a project where neurons from mice 
are exposed to a certain liquid and the response of the neuron is recorded. Thus the 
neurons are used as sensors and facilitate the analysis of the liquid provided that 
an interpretation of the signals is possible. The range of applications of this kind of 
cell based sensors is wide, including pharmaceutical screening and toxin detection. 

To obtain an increased output signal we want to simulate the electric field of 
an active neuron firing an action potential and the neuron-electrode interface. The 
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field will be calculated by means of Maxwell’s equations using FIT. The results will 
be compared with practical measurements. 

As proposed in the literature [1] electro-quasistatics is a good approximation for 
the simulation because the wavelength is much larger than the considered geometry. 
We are talking about a wavelength in the dimension of cm and a geometry of mm in 
the worst case [5]. So we may assume the EQS model to be a good approximation. 
Later, a posteriori calculations of the magnetic field will be used for validation. 

We base our simulations on transient EQS equations which will be solved for 
different times ti. Our strategy is to solve this equation first under the assumption 
that we have a steady state system with ^ -f- J = 0. So we can calculate Eo 
E(to) with the electrostatic equations. After having calculated Eo and therefore 
Do := D(to) we calculate the fields Ei for i>l step by step. For this task we can 
show that we only need Do and Ampere’s law. Actually, it is enough if we find a 
solution for the equation 



div(£ grad(/9ne-u; ) = div(Do/d + (7) 

with ^ T^new-^gid ^ Q E = — grad(/?. At this point the reader may 

have recognized that we missed to describe how we get J for the right hand side of 
equation (7) yet. 

During an action potential of a neuron the conductivity of the membran is 
changing drastically. Thanks to Hodgkin and Huxley we can model this with the 
following set of equations, the Hodgkin-Huxley-Equations [16]: 



^ - En.) + 9Kn\V - Ek) +S,(V - £,)) 



( 8 ) 



— = an(l -n) - (3nn, — = am{l - m) - (3mm, — = ah{l - h) - phh 

with the membrane capacity Cm, the potential difference V over the membrane, 
the time t, the membrane conductivity for sodium ions the membrane 

conductivity for potassium ions ~g^ri^ ^ the membrane conductivity for all other 
currents (leakage current), the resting potential for sodium E^Na, the resting 
potential for potassium Ek, the resting potential for the rest E\ and given constants 
^Na 5 Pi well as given functions a^, /3i {ai = OLi{V)^ /3i = /3i{V)), We use this set 
of equations to calculate the conductivity for different states of the system - more 
precisely, the conductivity of sodium, potassium and the leakage current for the 
membrane of a neuron. Also, these equations yield the potential difference across 
the membrane. 

Having already coupled the Hodgkin-Huxley-equations with Maxwell’s equa- 
tions, which is an important first result, we are able now to use this for the tran- 
sient EQS simulation: Figure 2 shows the currents of sodium jNa and potassium 
Jk calculated by MAFIA and by (8) as a reference: for only six points in time the 
agreement is already very good. Based on these currents a 3D field calculation is 
carried out according to (7). Later, we will of course use finer time steps than in 
these proof of principle simulations. 
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Until now the electric field is calculated as described above including the calcu- 
lation of a potential at some electrode in a simplified model of a neuron-electrode 
interface: At moment we ignore the effects of electrolysis and consider only ca- 
pacitive effects. The next steps will then be to implement all necessary equations. 
Furthermore, a comparison with other software packages able to simulate electrol- 
ysis is foreseen. 



Slowly Varying Electromagnetic Fields in the Human Body 

Human body tissues exposed to low frequency electromagnetic fields present two 
kinds of conductivity. Some tissue classes are isotropic (i.e. fat tissue) whereas other 
tissue classes (i.e. skeletal and cardiac muscles) are anisotropic having a conductiv- 
ity that is higher along the fibers than across, which can be described by a diagonal 
tensor only in a local coordinate system [17] 

/a; 0 0 \ 

0 o-t 0 (9) 

VO ^ at 



where ai is the longitudinal conductivity and at is the transverse conductivity. 

To express this tensor in a global coordinate system, it has to be rotated by means 
of a rotation matrix R 



—G ~ ^ 

where R is the product of two other rotation matrices: R = RxyRxz 



R 



xy 



cos(/) sincj) 0 
— sin (f) cos (/) 0 
0 0 1 



Rxz = 



^ sin 'd 0 cos 'd '' 
0 1 0 
, cos ^9 0 — sin 'd , 



( 10 ) 



( 11 ) 



The two angles (/> and are the rotation angles on z- and y- axes, respectively. 
Building the product of Rxy and Rxz 



R = 



sin 'd cos (f> — sin 0 cos 'd cos (p 
sin sin <p cos (p cos 'd sin (p 
cos 0 —siii'd 



( 12 ) 



substituting (12) into (10) we obtain the conductivity tensor a^ of muscle tissues 
expressed in a global coordinate system. This is a full and symmetric tensor. 

In order to simulate slowly varying electromagnetic fields in the human body 
taking advantage of the properties of FIT, the latter has to be extended to fully 
anisotropic materials. The classical method can handle materials that are isotropic 
or diagonal anisotropic in a global coordinate system. It was then extended in [18] 
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to gyrotropic materials which satisfy that - although their material tensor has some 
off-diagonal terms - it is possible to align one of the coordinate axes to one of the 
main directions in the material. We will follow the basic ideas but in the case of the 
human body no main direction is ascertainable in the body as muscle fibers have 
miscellaneous orientations. 

The off-diagonal terms from the global conductivity tensor of muscle tissues 
will connect components of the electric current density and electric voltage which 
are allocated in different places on the FIT grid. For this reason, in each location of 
an electric current density component we define three electric voltage components 
by interpolating each of them among four values situated in the first vicinity on 
the grid, oriented on the same axis. To keep the symmetry of the material matri- 
ces, the diagonal terms from the conductivity tensor of muscle tissues are averaged 
from primal grid cells to dual facets as in the classical FIT but the off-diagonal 
elements which couple components of electric current density and electric voltage 
on two axes, are averaged along a dual edge oriented on the third axis. This ap- 
proach implies some first order approximation considering that primal edges have 
about the same length as the corresponding dual edges. The error introduced by 
this approximation vanishes in case of an equidistant grid which is true for our hu- 
man body model because it consists of cubic voxels. By introducing the anisotropy 
in the FIT equation for FQS regime and using a realistic human body model of 
1 mm resolution as well the software package PFTSc [21], [19], [20] for parallel 
computation, we hope to obtain very fast and accurate solutions to the problem of 
simulating low frequency electromagnetic fields in the human body. A major part 
of the implementation is completed, actually. 



Solving the Linear Systems 

In this paper the Krylov-subspace methods BiCGCR (Bi-Orthogonal Conjugate 
Gradient Conjugate Residual) [3,4], QMR (Quasi Minimal Residual) and so-called 
CSYM [2] are compared, each combined with Algebraic Multigrid (AMG) or Jacobi 
as preconditioner. Explicit descriptions of the algorithms, further references or a 
detailed investigation for complex symmetric systems can be found in [3,4,11]. Both 
preconditioners are implemented in the software package PEBBLES [11,12], others 
showed to be too expensive [4]. 

Results are shown for an insulator sample, an epoxy resin block of 100 mm x 
100 mm X 20 mm and two water droplets on top. The test object has horizontally 
embedded electrodes with a center distance of 35 mm and a radius of 7.5 mm. The 
droplets’ diameter is 6 mm (hemispheres), their center distance is 10 mm according 
to the accompanying experiments [9]. 

The characteristic convergence behavior for the methods is shown in Fig. 3. 
QMR and BiCGCR with the AMG-preconditioner PEBBLES perform very similar 
with respect to the number of iterations. The assumption that the combination 
of CSYM with PEBBLES instead of Jacobi improves convergence could not be 
verified. This fact is also reflected in Table 1 where CPU-times are specified. 

The AMG-preconditioner obviously accelerates the iteration process in spite of 
the relatively large setup times compared to classical iterative solvers. PEBBLES’ 
coarsening factor a, which influences the number of necessary levels and the di- 
mensions of the reduced problems, also intensively influences the performance: For 
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Fig. 3. BiCGCR, QMR and CSYM 
with Jacobi- (upper 3) and AMG- 
preconditioner (lower 3 curves). 




Fig. 4. AMG-QMR with different 
coarsening factors a. 





e 


setup (sec) solver (sec) total time (sec) # iter.j 


AMG-BiCGCR 


10^ 


187.77 


703.44 


891.21 


15 


AMG-QMR 


lQ-10 


186.83 


493.21 


680.04 


15 


AMG-eSYM 


lQ-10 


187.14 


>2,000 


>2,000 


>65 


Jacobi-BiCGCRj 


10“^ 


10.45 


856.15 


866.60 


41 


Jacobi-QMR 


10"2 


10.43 


810.17 


820.60 


40 


Jacobi-eSYM 




10.61 


1,390.01 


1,400.62 


90 



Table 1. Number of mesh points Nh = 450, 241, 6 levels, a = 0.01, mesh size ratio 
ph = The calculations were done on a spare SUN, Ultra-1 with 296 MHz. 



smaller a the total CPU-time grows in spite of lower setup times and the conver- 
gence curve gets slightly oscillatory (see Fig. 4). 



Conclusions and Acknowledgment The electro-quasistatic model is suit- 
able for many applications in the low frequency regime. Together with powerful 
algebraic solution methods and the discretization with FIT various new application 
areas of detailed electromagnetic field simulation are opened. 

The authors wish to thank Markus Clemens and Frank Sachse for many helpful 
discussions, especially on the human body problem. 



References 

1. H.A. Hans, J.R. Melcher, Electromagnetic Fields and Energy^ Prentice-Hall. 
Inc., 1989. 

2. A. Bunse-Gerstner, R. Stover, On a Conjugate Gradient-Type Method for Solv- 
ing Complex Symmetric Linear Systems^ Lin.Alg.AppL, Vol. 287 (1999): 105- 
123. 

3. M. Clemens, R. Schuhmann, U. van Rienen, T. Weiland, Modern Krylov Sub- 
space Methods in Electromagnetic Field Computation Using the Finite Integra- 
tion Theorie, ACES Journal, Vol. 11 (1996): 70 - 84. 




EQS in Different Applications 363 



4. M. Clemens, T. Weiland, U. van Rienen, Comparison of Krylov-Type Methods 
for Complex Linear Systems Applied to Hiqh-Voltaqe Problems, lEEE-T.Maff., 
Vol. 34 (1998): 3335 - 3338. 

5. J. Dudel, R. Menze, R.F. Schmidt, N eurowissenschaft, Springer, 2001. 

6. R.W. Freund, N.M. Nachtigal, An Implementation of the QMR Method Based 
on Coupled Two-Term Recurrences, SIAM J.Sci.Comput., Vol. 15 (1994): 297 
- 312. 

7. J. Keener, J. Sneyd, Mathematical Physiology, Springer, 2^^ printing, 2001. 

8. S. Keim, D. Konig, Study of the Behavior of Droplets on Polymeric Surfaces 
under the Influence of an Applied Electrical Field, Proc. lEEE-CEIDP (1999): 
707 - 710. 

9. M. Kneuer, Diploma Thesis, TU Darmstadt, 2000. 

10. MAFIA 4, CST GmbH, Bad Nauheimer Str. 19, D-64289 Darmstadt, Germany. 

11. S. Reitzinger, U. Schreiber, U. van Rienen, Algebraic Multigrid Methods for 
Complex Symmetric Matrices and Applications, Journal for Computational 
and Applied Mathematics, to appear. 

12. S. Reitzinger, PEBBLES - User’s Guide, 1999, www.sfb013.uni-linz.ac.at. 

13. U. van Rienen, Numerical Methods in Computational Electrodynamics - Linear 
Systems in Practical Applications, Springer-LNCSE, Vol. 12, 2000. 

14. U. Schreiber, U. van Rienen, S. Keim, Simulation of Electric Field Strength 
and Force Density on Contaminated H-V Insulators, Springer-LNCSE, Vol. 18 
(2001): 79 - 86. 

15. T. Weiland, A discretization method for the solution of Maxwell’s equation for 
six-component fields. Electron. Commun. AEU, Vol. 31 (1977): 116 - 120. 

16. E.K. Yeargers, R.W. Shonkwiler, J.V. Herod, An Introduction to the Mathe- 
matics of Biology, Birkhauser, 1996. 

17. F.B. Sachse et ah. Comparison of Solutions to the forward Problem in Electro- 
physiology with homogeneous, heterogeneous and anisotropic Impedance Model, 
Biomedizinische Technik, Vol. 42 (1997): 277 - 280. 

18. H. Kruger, Zur numerischen Berechnung transienter elektromagnetischer 
Felder in gyrotopen Materialien, Der Andere Verlag, 2000. 

19. S. Balay et al., PETSc home page (2001), http://www.mcs.anl.gov/petsc 

20. S. Balay, W.D. Gropp, L.C. Mclnnes, B.F. Smith, PETSc Users Manual, Ar- 
gonne National Laboratory, ANL-95/11 - Revision 2.1.3 (2002). 

21. S. Balay, W.D. Gropp, L.C. Mclnnes, B.F. Smith, Efficient Management of 
Parallelism in Object Oriented Numerical Software Libraries, in E. Arge, A.M. 
Bruaset, H.P. Langtangen (Eds.): Modern Software Tools in Scientific Com- 
puting, Birkhauser (1997): 163 - 202. 




Substrate Resistance Modeling by 
Combination of BEM and FEM Methodologies 



E. Schrik and N.P. van der Meijs 

Delft University of Technology / DIMES 
Mekelweg 4 
2628 CD, Delft 
The Netherlands 



Abstract. In present-day IC’s, substrate noise can have a significant impact on 
performance. Thus, modeling the noise-propagation characteristics of the substrate 
is becoming ever more important. Two ways of obtaining such a model are the 
Finite Element Method (FEM) and the Boundary Element Method (BEM). The 
FEM makes a full 3D discretization of the entire substrate and is very accurate 
and fiexible, but, in general, it is also slow. The BEM only discretizes contact 
areas on the substrate-boundary, and is usually faster, but less fiexible, because it 
assumes the substrate to consist of uniform layers. Sometimes, layout-dependent 
doping patterns near the top of the substrate may also play a significant role in 
noise- propagation. The FEM would easily be able to model such patterns, but it 
can often be too slow. The BEM, on the other hand, might not always be accurate 
enough. This paper describes a combination between BEM and FEM, which results 
in a method that is faster than FEM but more accurate than BEM. Through a 
number of experiments, the method is validated and successfully verified against 2 
commercially available tools. 



Introduction 



In present-day micro-electronic designs, substrate crosstalk can significantly infiu- 
ence the functionality of the design. In mixed-signal designs, for example, the noise 
originating from the switching activity in the digital part can propagate through 
the substrate and have a serious negative impact on the behaviour of the analog 
part. Similarly, in digital designs, substrate noise can have infiuence on the clock 
generator (typically a Phase-Locked Loop, PLL) causing fiuctuations in the clock 
frequency (clock jitter). Thus, modeling the noise- propagation characteristics of the 
substrate is becoming ever more important [1-3]. 

Two principal ways of obtaining such a model are the Finite Element Method 
(FEM, as extensively described in e.g. [4]) and the Boundary Element Method 
(BEM [5]). The FEM makes a full 3D discretization of the entire substrate, and 
therefore it is very accurate and fiexible, but, in general, it is also slow. The BEM, 
on the other hand, assumes the substrate to consist of uniform layers and only 
discretizes contact areas on the substrate-boundary. Therefore it is less flexible, 
but it can be significantly faster. 
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field oxide (FOX) 




Fig. 1. Highly doped channel-stop layer underneath the Field Oxide. 



Unfortunately, the assumption of a uniform, layered substrate often does not 
hold due to the presence of specific, layout- dependent doping patterns near the 
top of the substrate (e.g. channel- stoppers, trenches, buried layers or sinkers). An 
example of this is presented in Figure 1, which shows a highly-doped channel-stop 
layer immediately underneath the Field Oxide. The high doping-level of this layer 
will cause its resistivity to be low and, as such, it can play an important role in 
the noise-propagation through the substrate. Even though the FEM would easily be 
able to incorporate such patterns into the model, it can often be too slow. The BEM, 
on the other hand, might not always be accurate enough, because the patterns do 
not form a uniform layer. 

In order to circumvent this modeling-dilemma, we have developed a combined 
BEM/FEM method [6] that is faster than FEM and more accurate than BEM. Even 
though the concept of a combined BEM/FEM method is not new [5,7], we do apply 
it in a new context. In this paper, we will briefly summarize the method, its proof 
and its implementation into the SPACE layout-to-circuit extractor [8], after which 
we will describe some successful applications and comparison to 2 commercially 
available tools. 

The structure of this paper is as follows: Section 6 will first give a brief back- 
ground of the BEM and FEM methods and their combination. Section 6 will briefly 
summarize the proof after which Section 6 will mention some details on the imple- 
mentation of the prototype. Finally, Section 6 will present some applications, and 
Section 6 will state our conclusions. 



Background 

In our modeling problem we are interested in finding a resistance network that 
represents the substrate. Since resistance relates potential differences to currents, 
our modeling approach is to solve a potential problem in a passive 3D domain. In 
mathematical terms, this is equivalent to solving the Laplace equation 

V(o-V^) - 0 (1) 



with the potential, and a the conductivity of the domain. When imposing contin- 
uous boundary conditions on the domain, the resulting potential field will be such 
that the energy contained within the domain is minimal. In mathematical terms, 
this is equivalent to minimizing the following energy functional: 

£ = f a\\V${p)fdp 

Jfi 



( 2 ) 
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with i? representing the entire 3D domain of interest. 



Boundary Element Method The Boundary Element Method [5] originated 
from the observation that the Laplace equation can be written as a boundary 
integral equation by applying Green’s second identity. When imposing Dirichlet 
boundary conditions on the contact areas of the substrate and Neumann bound- 
ary conditions on non-contact areas of the substrate, a very simple expression will 
remain 

^(*) = [ Hj)G{i,j)dj with i,jeSi (3) 

JSi 

where S is the entire boundary, 5i C 5 is the entire contact area, k{j) is the 
continuous current distribution on Si^ and G{i,j) is the so-called Green’s function. 

The Green’s function relates the potential in point i to a unit current in point 
j. It is a fundamental solution to the Laplace equation and, as such, it automati- 
cally ensures minimization of the energy functional (2). For a uniform homogenous 
medium, it looks as follows 






1 

dTTcrr 



( 4 ) 



where r = {xi — xj)‘^ H- {yi — yj)‘^ (because contact areas are usually situated 
on top of the substrate, the ^^-coordinates have been omitted from this expression). 
The Green’s function basically ’’encodes” the characteristics of the medium. It exists 
for media consisting of multiple uniform layers, but its complexity will drastically 
increase with each added layer, and it will eventually become infeasible and even 
impossible to compute. 

The unknown current distribution k{j) can be approximated by discretization 
of the entire contact area into smaller panels (see Figure 2; left) and assuming a 
constant current distribution on each panel. By applying the Method of Moments 
[9], a set of linear equations can be found from which we can solve a piecewise 
constant approximation of the current distribution [3]. Prom this solution, we can 
then easily find the resistance network we were looking for. 





Fig. 2. Visual representation of the BEM (left) and the FEM (right). For the BEM, 
only contact areas on the top boundary of the domain are discretized; for the FEM, 
the entire domain is discretized 
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The BEM owes its speed to the fact that the integration only takes place over the 
contact area Si of the boundary and to the fact that the Green’s function encodes 
the characteristics of the entire medium. The bottle-neck, however, is solving the 
system of linear equations, which is typically full and potentially large; dedicated, 
highly optimized solvers and/or sparsification techniques are required. 



Finite Element Method The FEM makes a full 3D tetrahedral discretization 
of the entire domain (see Figure 2; right), and assumes linearity within each tetra- 
hedron. As such, it allows for a very simple mathematical expression to describe 
the electrical interactions between the nodes at the corners of each tetrahedron. 
Through an incidence strategy, a typically very large, but sparse system of equa- 
tions is found for the interactions between the nodes in the system. The field solution 
is then found by imposing boundary conditions and minimizing a discretized ver- 
sion of the energy functional (2), which acts as an alternative formulation of the 
Laplace equation. 

However, for finding a resistance network, the minimization of the energy does 
not have to be carried out explicitly, because the linear mathematical relationships 
between the FEM nodes can be represented by resistors. As such, the FEM dis- 
cretization is equivalent to a large, sparse resistance network and minimization of 
energy is ensured (implicitly) by the circuit simulator during simulation. 

The 3D discretization will automatically incorporate any inhomogeneities of the 
domain into the model, which lends the FEM its accuracy and flexibility. However, 
the size of the model, despite its sparsity, also renders the FEM rather slow. 



Combined BEM/FEM As presented in [6], a combined BEM/FEM method 
may be efficient when both speed and accuracy are required. For this approach, 
we effectively define the specific doping patterns (see the introduction) as FEM 
domains, and the rest of the underlying substrate as a BEM domain. As such, the 
BEM and the FEM ’’communicate” with each other through an interface. 

The combination between the methods is immediately allowed when we choose 
the BEM interface mesh as the Voronoi polygon of the FEM interface nodes [6]. In 
other words, we construct a duality between the BEM and FEM interface meshes, 
such that each FEM node will be associated with one BEM panel (see Figure 3; 
left) and an immediate connection of the resulting networks is allowed (see Figure 
3; right). 

Indication of Proof 

We will use Figure 4 as a step-by-step guide through the process. 

Figure 4a: When imposing continuous boundary conditions on a domain that 
is subject to the Laplace equation, a field with minimum energy will manifest 
itself in the domain. 

Figure 4b: Discretization of the boundary conditions results in a new minimum- 
energy field that is close to In terms of energy: E{^') = E{^rn) ± £ 

Figure 4c: We now divide the domain with an interface. If the interface pre- 
serves energy, the solution will be valid, and there will be a continuous potential 
match along the interface. The energy will be such that F(#'i) -h E{^ 2 ) — E{^'). 
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Fig. 3. Left: Top view of the BEM/FEM interface, the BEM mesh is the Voronoi 
polygon of the FEM nodes; Right: Side view of the BEM/FEM interface, Voronoi 
meshing allows for direct connection of the BEM and the FEM model 



Figure 4d: Discretization of the interface introduces a discrete potential match 
along the interface, which causes some "strain” in both fields. Therefore, the total 
energy will be slightly larger: + E{^ 2 ) = E{^') + s. 

Figure 4e: If we were to introduce BEM and FEM, their difference in basis 
functions (piecewise constant and piecewise linear, respectively) would cause a dis- 
continuity in the field. Therefore, we introduce an h-thin layer along the interface 
across which we define a linear interpolation between both fields. 

Figure 4f: We now introduce ^fem and ^bem> Variable h, as defined above, 
controls both the thickness of the h-thin layer and the granularity of the dual 
interface meshes (this keeps the gradient of the linear interpolation in proportion 
as /i — )■ 0). Utilizing the convergence properties of BEM and FEM {0{h?) and 0(h), 
respectively), we can prove that the energy contained within the linear interpolation 
goes to zero as 0{h) (see [10] for a more extensive discussion). 

The final observation necessary for proving convergence is that the BEM and 
FEM fields both minimize energy (by definition) and that the energy contained 
in the linear interpolation between them contributes a negligible part to the total 




Fig. 4. Different stages in the convergence proof of the combined BEM/FEM 
method 
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Fig. 5. The combined BEM/2D FEM modeling approach. Left: side view of theo- 
retical version bcised on dual meshing; Right: 3D view of implemented version based 
on ’’nodal proximity” 



energy. As such, for fine enough interface mesh, the total energy in Figure 4f can 
always become smaller than the energy in 4d. Additionally, the energy in 4f will 
never become smaller than the energy in 4b, because 4b is the exact, minimum 
energy solution for this set of boundary conditions. Situation 4f is now enclosed 
between 4d and 4b, which proves the convergence. 



Implementation 

To test the behaviour of our method in practice, we have implemented a first 
prototype of our method into the SPACE layout-to-circuit extractor. The prototype 
utilizes a 2D FEM instead of a 3D FEM (as schematically represented in Figure 5; 
left). The 2D FEM is a valid modeling methodology, as long as the FEM domain 
is thin and has a significantly lower resistivity than the BEM domain. 

Unfortunately, the concept of dual meshing is not very feasible to implement, 
because calculating the BEM for hexagonal panels is more difficult. Therefore, in 
our prototype, the combination between BEM and FEM will not be done based on 
dual meshing, but on ’’nodal proximity” (as drawn in Figure 5; right). 



Experiments 

Layout (a) from Figure 6 represents a strip of channel-stop that ’’connects” ter- 
minals A and B on top of a substrate consisting of a 250/im-thick epitaxial layer 
with conductivity 10 S/m and a backplane metallization. The left part Figure 7 
compares the results of our method to those found with Momentum (RF simulator, 
part of the ADS software by Agilent Technologies Inc.) as a function of channel- 
stop resistivity and for different mesh-settings. Upon increasing the resistivity of 
the channel stop layer, we see an increase of the error, but mesh refinement in 
Momentum’s mesh reduces the error again. 

The layout from Figure 6(b) represents two separate terminals A and B that 
are 30/im apart, with an island of channel-stop in between. The whole structure is 
situated on top of a similar substrate as in the previous experiment. We have varied 
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Fig. 6. (a) A simple strip of channel-stop with terminals A and B at the ends, 
(b) Two terminals with an ’’island” of channel-stop in between, (c) Another simple 
strip structure with drastically reduced size; the dashed box represents the FEM 
domain boundary. 

Table 1. Resistance values and computation times. Rab = resistance between 
terminals A and B, Ras = resistance between terminal A and the backplane. 





Rab (kf2) 


Ras {kQ) 


time (s) 


SPACE 


5.22 


21.1 


58 


Davinci 


b 

00 


33.0 
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the size of the island with the parameter w (which also keeps the island square) 
and calculated the resistance network, with both SPACE and Momentum. The 
results can be found in the right part of Figure 7. We can see clearly that SPACE 
and Momentum both show similar behaviour. As it appears, there is some offset 
between SPACE and Momentum, but this is possibly caused by a minor difference 
in the handling of the terminals {A and B). 

As a final experiment, we compared our results to those of the 3D FEM device 
simulator Davinci (integrated into Taurus-Workbench, by Synopsys). We used the 
layout shown in Figure 6(c), which is again a strip structure, but scaled down to a 
size that can be handled by Davinci. The experiment setup is similar to the first 
experiment, except that the epitaxial layer is now only lOfim thick. The results 
can be found in Table 1. The table clearly shows that the results of the 3D FEM 
method and the 2D FEM/BEM method are reasonably close, and that the new 
method can be considerably faster. 



Conclusions 

Specific doping patterns in the top layers of the substrate can play an important role 
in substrate crosstalk effects. Thus, it is important to be able to incorporate these 
patterns into our substrate resistance model. Unfortunately, the two main modeling 
methodologies for the substrate, BEM and FEM, both have their disadvantages in 
this respect. The BEM is not accurate enough and the FEM is (usually) too slow. 




R(a,b) (kn) 
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Fig. 7. Left: The direct resistance between terminals A and B from Figure 6(a) for 
increasing values of the resistivity of the channel-stop strip and for different mesh 
settings. Right: The direct resistance between terminals A and B from Figure 6(b) 
for increasing size of the channel-stop island. 



Therefore, this paper presents a combined BEM/FEM methodology, which is more 
accurate than BEM, and significantly faster than FEM. The speed and the accuracy 
of the method perform well with respect to two commercially available tools. Our 
current research concentrates on model reduction and solution schemes for fast 
global solution of the combined BEM/FEM equations. 
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Abstract. The numerical simulation of the induction heating of three-dimensional 
metal bodies by moving inductors can be problematic from several points of view 
when using standard FE or BE schemes. In this paper we describe the main diffi- 
culties and propose an alternative modelling approach based on the reformulation 
of the Maxwell’s equations into a system of second- kind Fredholm integral equa- 
tions. These equations for the eddy currents are coupled with the heat transfer 
equation with non-linear temperature-dependent material parameters. Mathemat- 
ical analysis of the existence and uniqueness of solution to the continuous as well 
as discrete problem is provided and convergence of the numerical scheme is shown. 
An illustrative numerical example is presented. 



Introduction 



Nowadays, the induction heating becomes more and more popular because it be- 
longs to the most efficient, safe and ecological technologies for the heating of metals. 
It is based on the generation of eddy currents and consequent Joule losses in the 
metal body. These internal heat sources are capable of producing a convenient time- 
dependent temperature distribution, minimizing the temperature gradients within 
the metal body and dramatically reducing the danger of surface damages by oxi- 
dation and other chemical changes. 

Efficiency and other parameters of the process depend on a number of various 
factors. Important is, for example, the arrangement of the inductor (that may pro- 
duce transversal, longitudinal or generally oriented electromagnetic field within the 
heated metal body), its position with respect to the body, presence or absence of a 
magnetic circuit, frequency of the field current etc. A good and economical design 
of such a device, therefore, must be based on a sufficiently accurate modelling of 
the process. 

As the induction heating is driven by the eddy currents which can be derived 
from the electromagnetic field, one could ask why the standard Maxwell’s equations, 
discretized by standard methods, are not enough to resolve the problem. Let us 
briefiy list the main obstacles making the use of standard schemes difficult. 
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a) Moving inductors make the use of FEM prohibitive due to the necessity of 
repeated reconstruction of the finite element mesh. Moving grid techniques do not 
help here due to the very complex and global nature of the inductor motions. 

b) Remeshing can be difficult or infeasible even for problems with stationary 
inductors. The reason is the geometrical incommensurability of the heated body 
(bodies), inductors (formed by curved conductors, coils and/or their combinations) 
and the surrounding air subdomain. 

c) Maxwell’s equations resolve the complete electromagnetic field in the whole 
domain^ particularly also in the surrounding air subdomain where no induction 
heating occurs. This considerably increases the computational costs. 

d) Formulation of boundary conditions for Maxwell’s equations can be prob- 
lematic due to the geometrical incommensurability of the subdomains (e.g. for 3D 
inductors represented as ID wires). 

Alternative modelling approach: 

The aim of this presentation is to show how the mentioned obstacles can be 
overcome for non-ferromagnetic metals by reformulating the Maxwell’s equations 
into a system of complex second-kind Fredholm integral equations describing the 
behaviour of the eddy currents in the heated body. The eddy currents are used for 
the evaluation of the Joule losses that enter the heat transfer equation through a 
source term. Herewith we eliminate the necessity of remeshing the air subdomain, 
the necessity of resolving the electomagnetic field in it, and effectively allow for com- 
putations involving moving inductors. We deal with time-harmonic electromagnetic 
field and consider general temperature-dependent material properties 7 , A and pc 
in this study. 

The example presented in this study comes from a difficult industrial simulation 
dealing with the heating of a brass workpiece of a nontrivial shape by a tubular 
water-cooled rotating inductor. This example is not supposed to be easily solvable 
by means of standard finite element or boundary element methods. 

Mathematical background of the method is established by proving the exis- 
tence and the uniqueness of solution for the continuous as well as discrete problem. 
Convergence results for the numerical scheme are also provided. 



Description of the Technical Problem 



A bounded metal body Q\ with a Lipschitz-continuous boundary is heated by 
an inductor formed by a system of conductors and/or coils i ?2 (see Fig. 1). For 
simplicity, let the conductors and coils carry identical harmonic current /ext of 
angular frequency u. The inductor contains no ferromagnetic parts. Due to the 
absence of any non-linearities within the investigated domain all quantities of the 
electromagnetic field may be expressed in terms of their phasors. Let us mention, 
e.g., copper, brass and stainless steel as a few examples of non-ferromagnetic metals. 




Induction Heating of 3D Metal Bodies 



375 




Fig. 1 . Basic arrangement of the device. 



The coupled electromagnetic-thermal model 

Let us consider a point Q G /?i. Using the Coulomb gauge, the phasor A of the 
vector potential at this point is given by the superposition of two components 
excited by uniform field current and eddy currents Jeddy 

A(Q) = A{PQ) + A{RQ) = ^ f ^ . (1) 

Here, fio denotes the permeability of vacuum, di(P) = {dx{P),dy{P))^ is a vector 
denoting the elementary length of conductor of the field coils and dR means the 
elementary volume of i?i. Remaining symbols follow from Fig. 1. 

The second Maxwell equation yields that TotE = —dBfdt = —diotA/dt. Inter- 
changing the order of the operators, we obtain E — —dA/dt — grad<^ where (p de- 
notes the scalar potential. Applying this equation to the body that is not connected 
to any external source of voltage (grad(/? = 0 ) and rewriting it in terms of the corre- 
sponding phasor quantities we finally obtain that E = —ycjA «Zeddy “ ~j 
Hence, 






( 2 ) 



where 7 = j(T) denotes the temperature-dependent electrical conductivity of the 
metal and lj the angular frequency of the field current. Substitution of ( 2 ) into (1) 
provides the basic integral equation for Jeddy 

iIeddyiQ)-'^iQ) r = f ^ ( 3 ) 

PRQ Jq2 

where k{Q) = u;^{T{Q))fio/{d7r). For each bounded and piecewise-continuous tem- 
perature distribution T in i?i, is a bounded and piecewise-continuous function 
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greater than a positive constant. The specific average Joule losses rcja in the metal 
body are then given by the formula 



tCja = 



—eddy —eddy 



7 



( 4 ) 



where «Zeddy complex conjugate to «Zeddy* 4"^^ non-stationary distribution of 

the temperature in the metal body is generally described by the equation 



dT 

div(AgradT) = ~ 



( 5 ) 



where A == A(T) denotes the thermal conductivity, p = p{T) the specific mass of the 
heated material, c = c(T) its specific heat and wjq, the specific Joule losses given by 
(4). The boundary condition along the whole surface of the body reads (radiation 
is not considered) 




= a(T — Text), 



( 6 ) 



where a denotes the coefficient of the convective heat transfer. Text the temperature 
of the surrounding medium (moving or quiet air) and n direction of the outward 
normal. 



Analysis of solvability and uniqueness 

The phasor equation (3) may easily be subdivided into three identical equations 
(for the components in spatial directions x, ?/, z) of a complex form. For the x- 
component, we obtain 



}Laay,AQ)-<Q) f = (7) 

rRQ ^PQ 

The symbol dx(P) means dP • where ex is the unit vector in the x-direction. 
Using the notation 



-Lx(v) = (v), (8) 

F.(v) = «(v) . (Re{7,,J,Im{/,,J)^ , 



( 9 ) 
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(P, Q G i?i are renamed to u, v), we can rewrite (7) into an operator form 



( J K)Lx — Fx 




(10) 


with 






(KL^)(v) = k(v) J 


lc(v, w)Lx{w)dw, 

Cl 


(11) 


k(v, w) = 

\v- w\ 


-M, 


(12) 



We consider the operator K to be defined as [L^(i?i)]^ It is easy to 

see that the operator is antisymmetric. Antisymmetric operators have only purely 
imaginary eigenvalues. Therefore, as 1,-1 cannot lie in the spectrum of fC, we 
immediately have the solvability, uniqueness and continuous dependence on the 
right-hand side for (10). An analogous conclusion holds, of course, for the remaining 
spatial components. 

There are no problems with the existence and the uniqueness of solution for 
the parabolic heat transfer equation (5) in a weak sense as all the temperature- 
dependent material parameters are Lipschitz-continuous functions. 

Analysis of the solvability and uniqueness of the discrete problem is performed 
in an analogous way. 



Discretization 

We discretize the integral equation by piecewise-constant functions on structured 
hexahedral meshes. The degrees of freedom for the eddy currents correspond to 
cell centers. The resulting dense systems of linear equations are solved by means of 
both a problem-optimized Gaussian elimination and ILU preconditioned iterative 
solvers. The Gaussian quadrature is in some sense more advantageous since it easily 
can take into account the specific structure of the matrix and one easily implements 
the simultaneous solution of multiple right-hand sides (corresponding to the three 
spatial components of the eddy currents). 

The equation (5) is semi-discretized in space using the method of lines and inte- 
grated in time by means of higher-order multistep methods correcting the position 
of the inductors and the temperature-dependent material parameters A, 7 and pc 
after each time step. The size of the time step is driven by usual criteria for the 
time-integration of parabolic equations. 



Convergence of the numerical scheme 

Let US consider the continuous problem (10). For simplicity, let us further consider 
that the domain i?i is covered by the discretization mesh exactly (i?i,h = ■Gi)* 
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Fig. 2. Geometry of the heated workpiece and inductor. 




Fig. 3. Geometry, mesh and initial temperature distribution. 



With a function kh obtained by elementwise averaging the function n from (7), we 
can write the discrete problem for the eddy current density 



( 1 + Kh)L^,h = F^,h> 



(13) 



with 



{KhL^,h){v) = Kh{v) / k{v, w)Lx,ft(w)dw. 

jQi 



( 14 ) 
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Fig. 4. Temperature distribution at t = 60 s (ranges from 152 to 190 °C) and at t 
= 120 s (ranges from 263 to 302 °C). 




Fig. 5. Temperature distribution at t = 180 s (ranges from 348 to 388 °C) and at 
t = 240 s (ranges from 418 to 458 °C). 



Subtracting (13) from (10) we obtain that Lerr = Lx — Lx,h is governed by 



Lerr = (1+ K)“' [^err - (K - KOI-., ft] 



(15) 



where obviously Ferr = F^ — F^,h 0 as the grid diameter h ^ 0, K — Kh — > 0 
as h -> 0 from the definition of Kh and Lx h is bounded from the compactness of 
{I+Kh)-\ 

It is reasonable to suppose that the temperature-dependent material param- 
eters 7 , A and pc are Lipschitz-continuous functions of the temperature. Then, 
convergence in the source terms of the heat transfer equation (5) yields also the 
convergence of the whole discrete coupled model for all finite times. 
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Numerical example: Heating of a brass workpiece 

A brass workpiece of a nontrivial shape (depicted in Fig. 2) is heated by a coil- 
shaped inductor formed by a hollow tubular water-cooled conductor. The arrange- 
ment of the system is obvious from parts A and B of Fig. 2 and from the Fig. 
3. All cells are hexahedral of a uniform size 2x2x2 mm. The workpiece stands 
on the xy-plane and the T-shaped face matches the points [0.012, 0.002], [-0.08, 
0.002], [-0.08, 0.012], [-0.012, 0.012], [-0.012, -0.012], [-0.08, -0.012], [-0.08, -0.002] 
and [0.012, -0.002]. The field current in the inductor I = 320 A, its frequency / = 
150 kHz. As the inductor is formed by a massive conductor, it was substituted by 
8 thinner conductors located at points indicated in part C of Fig. 2. Each of the 
eight conductors carries a current Ik = 40 A, k = 1, 2, ..., 8. The inductor rotates 
around the heated body with an angular frequency uj corresponding to one 27r-turn 
in 72 seconds. The initial temperature Tstart of the body and temperature Text of 
the surrounding air are 20 °C. The coefficient a of the convective heat transfer is 25 
W/m^. We are interested in the temperature evolution during the first 240 seconds 
of heating. 

In the Figures 4 - 5 we present the evolution of the the temperature within the 
investigated metal body at times t = 60, 120, 180 and 240 seconds. Fig. 6 depicts 
the temperature along several important surface lines. 




Fig. 6. Temperature at the time t = 240 s along the lines A: (—0.006,0,2:); B: 
(-0.01,-0.012,2); C: (-0.01, 0.012, 2) and D: (0.012,0,2). 



Outlook : 

The presented methodology will be further developed on both the modelling and 
numerical levels. Currently, the most focussed modelling problem is the extension to 
ferromagnetic metals. On the numerical side, we develop a higher-order discretiza- 
tion for the integral equations on unstructured tetrahedral grids, and study the 
possibility to improve the efficiency of the solution of the arising dense systems of 
linear algebraic equations. Implicit solvers for the heat transfer equation will be 
implemented. 
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Abstract. The application of multigrid (MG) methods for the solution of electro- 
magnetic problems has attracted attention in recent years (e.g. [8]). These problems 
are related to bilinear forms (curl *, curl +a (•, C M, which require special 

smoothers as presented in [1,7]. This paper shows by numerical experiments that 
these ideas also work for the time-harmonic eddy currents, i.e. for complex bilinear 
forms (curl curl -hia (•, -)^2 . Furthermore, an approximate projection proce- 
dure is presented that allows the application of multigrid to an un-gauged electric 
formulation even if there are regions with zero conductivity. Numerical results are 
shown for the TEAM Workshop problem 7. 



Introduction 

The time-harmonic eddy-current approximation of Maxwell’s equations is usually 
formulated as a second order PDE (partial differential equation). Two different 
formulation classes are possible, either the electric or the magnetic one. Here, we 
choose the electric one with the complex amplitude E of the electric field as the 
primal variable. 

We consider a bounded domain i? that is split into two (open) disjoint subsets 
Qc and Qi occupied by conductive and perfectly insulating material respectively. 
Qc is the union of all N conductors , which are not connected to each other 
(see Fig. 1). For simplicity, we assume that i?/ is connected, i? is simply connected 
and none of the i?c,i touches the boundary dQ. Note that these restrictions can be 
relaxed with some technical overhead. Finally, Qq C Qi denotes the support of gen- 
erator current densities jc that are assumed to be divergence-free and orthogonal 
to all harmonic Dirichlet vector fields in Qj. 

For a weak formulation of the problem, we introduce the (complex) spaces 



7/ = {E G i7(curl, /?) : n x E|ao = 0} 



and 



V = {A e H'^iQi) : A|an = 0, \\anc,i = const. Vi e 
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Fig. 1. A typical eddy-current setting with N = 2 conductors 

and define bilinear forms 

a(E, E') = f — curl E • curl E' -h io; f crE • E' 

Jo J Oc 

b(E,X)= [ E- grad A 

J Oi 

and the source functional 

/(E') = - iLu [ jc • E' 

Jog 

with the usual symbols fi for permeability, cr for conductivity and uj for the angular 
frequency. A variational formulation of the time-harmonic eddy-current problem 
then reads: Find E in and A in V, such that 

a(E,E') + fc(E',A) = /(E') VE'g^, (la) 

6(E,A') =0 VA'gV. (lb) 

The constraint (lb) enforces weak solenoidality of E in Qi and, due to the non- 
local condition in the definition of the space V, it also forces the total charge on 
every conductor Qc,i to zero. This fixes what can be designated as the electrostatic 
part o/E. 

However, in most instances in eddy-current modeling, we are only interested in 
the magnetic field and the current densities. Here, we adopt this restricted modeling 
task. Since the magnetic field and the current densities are independent of wether 
and how we fix the electrostatic part, equation (lb) plays the role of a gauge, i.e. in 
general, lEi\oi is not the physical electric field but rather resembles a vector potential 
for B. 

Hence, we drop (lb) without loss of modeling information and switch to the 
un- gauged formulation: Find E in U, such that 

o(E,E') = /(E') VE'eW. (2) 

This seems attractive since we do not end up with a saddle-point formulation. 
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Multigrid Method 

For eddy-current problems, the application of multigrid (MG) methods (see e.g. [6]) 
is appropriate, since these are the fastest methods currently available for solving 
linear systems of equations derived from a discretization of second-order PDFs. 
For the implementation of the finite element discretization with the lowest-order 
edge elements on simplices and an MG scheme, we use the lAQ (shorthand for 
unstructured grids) simulation environment, which provides very general tools for 
the generation and manipulation of unstructured meshes as well as a flexible data 
layout. UQ also includes a lot of numerical algorithms already implemented (see 
[ 2 ]). 

To take the different natures of E in Qc and i?/ into account, we first restrict 
our view to MG for a conductive domain. 



Multigrid in Conductive Regions 

It is well known (see [1,7]) that the standard smoothers do not work for problems 
involving the curl curl-operator. In the cited papers, special smoothers are con- 
structed to deal with operators of the form (curl curl -ha id), a real and positive. 
Operators of this form arise in every time-step for the time-dependent eddy-current 
problem if we have a positive conductivity in the whole domain Q. For a convergent 
MG scheme, a special overlapping block smoother is suggested in [1], whereas in 
[7] a “hybrid” smoother with an additional smoothing step in the space of scalar 
potentials is presented; see the latter reference for algorithmic details. 

Although the time-harmonic eddy-current problem, associated with complex 
operators of the form (curl curl -hi a id), is not yet covered by the theory, our ex- 
periments show that, using the proposed smoothers, MG will even work for the 
complex case. 

The following experiments are done with the hybrid smoother. MG is used 
together with Krylov acceleration, since this is the relevant case for practical ap- 
plications: 



Experiment 1 (2D) We consider a hierarchy of grids on a conductive unit 
square. The grid on level 0 (base level) consists of 2 triangles. The grids at higher 
levels are constructed by regular refinement. Thus, the number of elements is mul- 
tiplied by 4 from level I to /-hi, see Figure 2. Natural boundary conditions are 
imposed, the angular frequency and material coefficients are chosen such that the 
bilinear form can be written as 

a(E, E') = / curl E • curl E' -h ia / E • E' 

J f2 J fiQ 

with a = 7.16 leading to a penetration depth 6 Pi 0.5. 

A multigrid V( 1,1 )-cycle with the hybrid smoother as the pre- and post-smoother 
is used as a pre-conditioner for a BiCGStab [10] solver. Note that, for problems 
with a complex symmetric structure like the one under consideration, a transpose- 
free QMR (see e.g. [4]) would be more appropriate, but is not available in the 
UQ environment yet. 
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no. of levels 2 3 4 5 6 7 8 

no. of iterations 4 4 5 5 5 5 5 



Fig. 2. Grids on levels 0 to 5 of experiment 1 and number of iterations of the 
BiCGStab solver with MG pre-conditioner 



In the table in Fig. 2, the number of iterations of the BiCGStab solver for a 
defect reduction by a factor 10~^, measured in the Euclidian norm, is displayed 
(vanishing RHS, random initial-guess). The experiment shows that the number of 
iterations is independent of the grid size for the higher levels. 



Experiment 2 (3D) We adopt the parameters of experiment 1 and consider the 
unit ball in 3D. As we use a boundary projection of the surface nodes on refined 
levels, we do not end up with nested finite element spaces. Note that a boundary 
projection is important to avoid coarse grids that are too fine on the one hand (then 
the application of MG does not pay off) and geometric discretization errors that 
are too large on the other hand if we deal with curved boundaries. The grids of 
the first 4 levels are shown in Figure 3. Again, the number of BiCGStab iterations 




no. of levels 2 3 4 5 6 

no. of iterations 5 6 6 6 6 



Fig. 3. Grids on levels 0 to 3 of experiment 2 and number of iterations of the 
BiCGStab solver with MG pre-conditioner 



shows no dependence on the mesh width for fine grids (Figure 3). 
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Multigrid and the Presence of Insulators 

As we use the un-gauged formulation (2) instead of the full saddle point problem 
(1), we unfortunately have to deal with a singular stiffness matrix if insulating 
regions are present. In order to guarantee solvability, we assume discrete consistent 
right hand sides. These can be constructed by an appropriate a priori computation 
of the source current densities. 



Projection As mentioned above, without the gauge (lb), the solution E in i?/ 
is unique except for an electrostatic part. During an MG solution procedure, the 
L^-norm of this part can reach very large values, possibly leading to cancellation 
errors. To prevent this, a procedure is employed that fixes the electrostatic part 
of the solution by projection on the subspace defined by (lb), leaving the defect 
unchanged. Hence, one has to find a 0 in V, such that 

f grad 0 • grad 0' = -6(E, 0') V0' G V (3) 

Jdi 

and update the electric field 

E i — E + grad 0 

after one or several MG cycles. 

Since the procedure basically fixes the weak divergence of E in i?/ at zero, the 
algorithm can be referred to as a “weak divergence correction scheme”. However, 
it also has to remove admixtures of harmonic Dirichlet vector fields (cohomology 
vector fields) that are also ruled out by (lb). Therefore, the matrix associated with 
the discretization of (3) will not be completely sparse, giving rise to the appli- 
cation of some Schur-complement solver. However, for the implementation in the 
UQ environment, we decide to use a different method that will lead to completely 
sparse matrices. In this way, it is possible to use all solver and smoothing procedures 
available in UQ without any modification. We have to pay for this comfortable cod- 
ing feature by an a priori computation of a (weakly) solenoidal basis of the harmonic 
Dirichlet vector fields: 

The implemented projection procedure consists of a pre-processing step and 
the projection step itself, see Algorithm 6. In the pre-processing step, a (weak) 
divergence operator B and the discrete Laplacian L are assembled. Furthermore, 
a scalar potential for a Dirichlet basis and the Maxwell potential coefficients pij 
(e.g. see [9]) are computed. This requires solving N Laplace problems with Dirichlet 
boundary conditions. N is the Betti number of dimension 2 of i?/ , i.e. the number of 
“holes” in the insulator (occupied by the N conductors). The z-th Laplace problem 
reads: Find (f)D,i in {u G H^{Qi) : u\dOc j — Sij } , such that 



f grad (j)D,i • grad 0' = 0 V0' G Hi (12/ ) . 

J f2j 



The Laplace problems are solved using standard MG. 
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ProjectingIteratorPreProcessO { 

assemble B // discrete weak divergence operator 

assemble L / / discrete Laplacian 

compute Dirichlet basis {grad 

compute Maxwell’s potential coefficients pij 

} 



ProjectinglteratorStep( Edg-e Vector c) { 

dnode ^ Be // weak divergence 

solve (or sweep(s) with initial value Cnode = 0 on) 

Lcnode = —dnode // Weak divergence corr. 

compute n conductor fluxes Qj 
for i ^ 1(1)^^ 

Cnode ^ Cnode + PijQj)(l>D,i // total fluxes elimination 

c^G-QjCnode // discrete gradient 



In the projection step itself, one Poisson problem must be solved. This can 
be done approximately (resulting in an approximate projection) by doing one or 
several sweeps of an iterative solver (typically MG). After this first correction step, 
all Dirichlet vector fields are removed from c using the a priori computed potentials 
of a Dirichlet basis. denotes the node-to-edge incidence matrix, restricted to 
nodes in f2i (the discrete gradient operator in the insulating region). 



TEAM 7 benchmark problem 

To validate the proposed algorithm and its implementation, we considered the 
TEAM benchmark problem 7 (see [5]). The problem consists of an aluminum plate 
(a = 3.526 10^ S/m) with a hole and an excitation coil above the plate with a time- 
harmonic driving current of 2742 AT (see Fig. 4). The driving current reaches the 
maximum at cut = 0 and is directed anticlockwise in Fig. 4. The computational 
domain i? was artificially restricted to a cube with 1 m edge length with boundary 
conditions n x E = 0 at df2. 

To generate a hierarchy of grids at the base level / = 0, uniform refinement Wcis 
applied; see Fig. 5 for the base level grid. The grid levels I > 1 are generated by local 
refinement with the help of a residual error estimator (see [3]). Additionally, the 
grid was refined along the line AB in Fig 4. By means of this adaptive algorithm, 
grids up to level 5 were generated. 

For the solution of the linear systems of equations, we applied BiCGStab again 
as the outer iteration. The pre-conditioner was composed of one V(1,1)-MG cycle 
with the hybrid smoother and one iteration of Algorithm 6 with a single V(1,1)-MG 
sweep on the Poisson problem. 
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Fig. 4. Team 7 benchmark problem 



Note that local MG (e.g. see [11]) was applied here, i.e. at each level, smooth- 
ing had been done only for the refined region and not over the whole grid. This 
is essential for an adaptive algorithm since otherwise the overall complexity may 
not be optimal. The table on the right of Fig. 5 shows that the number of itera- 




level jjelements hedges [(iterations 



0 


5.610® 


6.710® 




1 


4.510^ 


5.310^ 


15 


2 


7.210^ 


8.810^ 


15 


3 


8.010^ 


1.010® 


16 


4 


1.610® 


2.010® 


15 


5 


3.8 10® 


4.710® 


16 



Fig. 5. Left: Base level grid (level / = 0), the grid in the air region is not displayed. 
Right: Number of BiCGStab iterations for a defect reduction by factor 10”^ using 
a V(l,l)-multigid pre-conditioner 



tions remains constant during the refinement. Thus, MG works very well for the 
benchmark problem. The solution at level 5 is displayed in Fig. 6 and shows good 
agreement with measured results. 
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x/mm 



Fig. 6. Left: Real part of the current density J in the aluminum plate. Right: 
sign (RejB;j;)|R 2 | along linevlH in Fig. 4. Note that the computed B is piecewise 
constant and the grid is very fine at level 5. 
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Abstract. This article introduces a method for modeling and analysis of perturbed 
oscillator behavior, i.e. the behavior of “ideal” oscillators subjected to weak inter- 
actions with the outside world. These interactions can e.g. involve an amplitude- 
regulating mechanism, as in harmonic oscillators, couplings to other oscillators, as 
in quadrature-type oscillators, and noise, both white and colored. The method is 
grounded on perturbation theory and averaging. Perturbation techniques allow us 
to separate the analysis of the unperturbed, ideal, oscillator from the analysis of the 
perturbed one. Averaging is used to separate the fast- varying and the slow- varying 
components of the oscillator’s behavior. Applications of this method include oscil- 
lator phase noise analysis and the construction of compact behavioral models for 
harmonic oscillators. 



Introduction 

Oscillators are key building blocks in almost all of today’s communication systems. 
Because of their importance, there is a need for methods that generate compact 
behavioral oscillator models. These models can be used for verification and trade- 
off analysis at the architectural level and for obtaining insight into the oscillator’s 
behavior and functioning. 

This work presents a method for modeling perturbed oscillator behavior, i.e. the 
changes in the behavior of an “ideal” oscillator when subjected to weak interactions 
with the outside world. These interactions can e.g. involve an amplitude-regulating 
mechanism, as illustrated in Fig. 1, couplings to other oscillators and noise, both 
white and colored. 

The modeling method performs a sequence of transformations on the original 
set of circuit equations and is solidly grounded on perturbation theory [7] and 
averaging [2,5]. A first transformation applies perturbation techniques to model the 
impact perturbations have on the behavior of an ideal, unperturbed oscillator. A 
second, averaging, transformation separates the fast- varying from the slow- varying 
components of the oscillator’s behavior. The averaging transformation presented 
in this work deals with both deterministic and noisy systems. The transformed 
oscillator equations make up a valuable starting point for further analysis or for the 
construction of compact behavioral models. The equations are valid over an infinite 
time- interval and can be integrated using a time step of the order T/e, where T is 
the oscillation period and e a perturbation scaling parameter. 
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Fig. 1. An ideal oscillator, in this case a lossless LC tank, weakly interacts with a 
cross-coupled transistor pair. This transistor pair serves as an amplitude-regulating 
mechanism. 

The remainder of this text is organized as follows. Section 6 discusses character- 
izing the unperturbed oscillator and use of perturbation techniques to capture the 
behavior of oscillators subjected to weak perturbations. In Sect. 6, we show how to 
handle these equations efficiently using averaging. Section 6 applies these results to 
phase noise analysis and construction of compact behavioral models for harmonic 
oscillators. Finally, Sect. 6 provides some conclusions 



Perturbed Oscillator Behavior 

This section concerns the structure of the solutions of the — properly normalized — 
system of equations, 

^ = f(x(i)) + eb(x(<),t) . (1) 

This system describes the behavior of a weakly perturbed oscillator. Here, x G 
IR^ represents a vector of unknown state variables, f (x) models the behavior of 
the unperturbed oscillator and eb (x, t) represents a perturbation term disturbing 
the oscillator’s behavior. This perturbation term can be either deterministic or 
stochastic. We furthermore assume the perturbation variable e, characterizing the 
magnitude of the perturbation term, to be small, i.e. |e| <C 1. The analysis that 
follows generalizes the results in [6], which discusses oscillator phase behavior, while 
establishing the link with the theory of motions over stable manifolds [1]. 

In solving (1), we first characterize the set of steady-state solutions of the un- 
perturbed system, i.e. for e = 0. Assuming that all solutions converge, in the limit 
t — > oo, to a P- dimensional, stable manifold M, each steady-state solution x(t), i.e. 
each solution of (1) which lies entirely on M, can be characterized as 



x{t) = Xs{t,p) . (2) 

Here x^ (i,p) G IR x ]R^ — > ]R^ with P < N and p € a vector specifying the 
initial conditions of the solutions of the unperturbed system that are located on M. 
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As illustrated by the applications in section 6 , the components of p correspond, for 
example, with amplitudes, phases and phase differences. 

Due to action of the perturbation term eb (x, t), solutions Xe(^) of the perturbed 
system, i.e. for |e| > 0, deviate from the manifold M. However, stability of M 
ensures that the Xe {t) remain bounded to its neighborhood. Solutions can therefore 
be written as 



Xe(t) = X 5 (t,p(t,e)) + eZ\x(t,e) . (3) 

They contain two components. The first one, Xs (t,p(t,e)), describes the motion 
over the manifold M. The second component, eAx(t,e), describes the departure 
from the attracting manifold. Note that the processes p(t,e) and Ax (^, e) depend 
on both t and e. For notational convenience, the dependence on e will never be 
mentioned explicitly unless when relevant. 

While the motion over the manifold may grow unbounded, i.e. ||p(t) — p(0)|| — > 
00 as t 00 , the manifold’s stability guarantees that p(t) can be chosen as for the 
departure e Ax (t) to remain 0(e) and this Vt. Substituting (3) into (1) therefore 
yields 



dp 






+eb (xs {t,p{t)) ,t) + 0{e^) 



( 4 ) 



Here we used the fact that, for arbitrary but fixed p, x^ (t,p) solves (1) for e = 0. 

The motion Xs (t,p(t)) over the manifold M typically contains that part of the 
behavior which is of greatest interest. It is completely determined by the process 
p(t). The equations governing this process are obtained by projecting (4) onto the 
tangent space of M spanned by U(t, p) = (^?p) € IR x IR^ — ^ Clearly, 

U(t,p) satisfies 

^(f,p) = £(x.(t,p))U(i,p) . (5) 

With V(t,p) €lRx]R^^IR^>'^a solution of 



at 



(f,P) 



^"(t.P)f (> 






( 6 ) 



it is readily seen that the product V^{t, p)U{t, p) remains constant with respect to 
t. Here, we are interested in the bi-orthonormality case: V^(t, p)U(t, p) = Ip,Vt 
with Ip G IR^^^ the unity matrix. Note, however, that if P < A”, ( 6 ) and the 
bi-orthonormality condition do not uniquely characterize V^(^,p). An additional 
constraint, related to the requirement that eAx(t) must remain bounded, needs to 
be imposed. In practice, this almost always requires V^(t,p) to be (quasi-) periodic 
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with respect to t. Multiplying both sides of (4) with V^(f,p) while using (6) and 
V^U = Ip then yields 



'^ = {t,p(t))h{xsit,p{t)),t) - e^(v'^ {t,p(t)) +0{e^) . (7) 



In order to explicitly separate the motion over the manifold M, characterized by 
the process p(t), from the deviation from the manifold, characterized by Z\x(t), we 
impose Ax{t) to satisfy 



{t,p{t))Ax(t) = 0 . (8) 

With (8), the second right-hand side term in (7) drops out, leaving 

^ it, Pit)) h (x. it. Pit)) , t) + Oie^) . (9) 



This set of ordinary differential equations (ODEs), which only depends upon p and 
t, completely characterizes the motion x(^,p(t)) over the manifold M. 



Averaging 



The system of ODEs (9), governing the motion of the perturbed oscillator over the 
manifold M, belongs to the class of ODEs structured as 



dp 

dt 



K 

= e'^hkipit),t) 

k = l 



( 10 ) 



Comparing with (9), we find hfc(p, t) = (f, p) b (xg (t, p) , t). We make it clear 

later on why it is sometimes useful to partition this right-hand side into a number 
of different components. Equation (10) can be solved efficiently using averaging 
[2,5]. Averaging exploits the fact that solutions of (10) will only vary substantially 
on a time scale t/e, i.e. they have a dominant slow- varying component. Use of 
averaging techniques allows us to extract that part of (10) producing this dominant 
slow- varying component. The averaged system of ODEs that results, serves as an 
excellent starting point for noise and stability analysis or for extraction of compact 
behavioral models. 
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Technically, averaging involves constructing of a near-identity, time-varying 
transformation of variables p p^, whereby values of the original vector p and 
the newly introduced vector p are related to each other through 



K 

P = P + « X! 

k = l 



( 11 ) 



The terms (p, t) in this transformation should be chosen such that (a) the trans- 
formed ODEs are easier to solve than the original ones, and (b) (p, t) remains 

bounded over the time interval of interest, which, in our case, is an infinite time 
interval. Both conditions are met for the (p, t) satisfying 



= h, (p,«)-M, [h,] (p,f) 



( 12 ) 



where the operator [•] represents some suitable, low-pass, averaging operator. 

Substituting (11) into (10) and using (12) shows the transformed process p{t) 
to satisfy 

^=e^h,(p«,i) + 0(e^) . (13) 



with hfc(p, t) = Mfc [hfc] (p, t). With careful choice of the Mk['], this transformed 
equation is often much easier to solve than (10). It is called the averaged equation 
corresponding to (10). This averaged equation makes up a valuable starting point 
for further analysis or for the construction of compact behavioral models. 

Selecting the averaging operator [•] is a degree of freedom which can be used 
to optimize the properties of the averaged ODEs (13). For each component in 
(10), a different averaging operator can be selected. This can, for example, be used 
to exploit particular properties of the different components h^. Some convenient 
averaging operators are 

rt+Tk/2 

^k[hk] = - hk{p,T)dr (14) 

J- Jt-n/2 

Mk [hfc] = i ® hfc(p, t) , (15) 

^ At this point, it might be useful to stress the distinction between a vector p G IR^ 
and a process p{t) G IR -> IR^. As a vector, p should be treated as having an 
arbitrary but fixed value. As a process, p{t) links a vector in IR^ to each instance 
of time. To avoid confusion, we will always write p to indicate an arbitrary but 
fixed value and p{t) to indicate a time-dependent process. 




Oscillator Modeling Using the Averaging Principle 395 



with (g) denoting the convolution operator and Tk a suitable time constant. Here, 
(14) is the classical averaging operator [2] and (15) is the ideal low-pass filter over 
the frequency range / = [— l/2Tfc, l/2Tfc]. Both averaging operators (14) and (15) 
ensure that p(^) contains almost all the low-frequent content of p{t) while ehfc(p, t) 
only contains high-frequent components. This way, slow-varying and fast-varying 
components of the oscillator’s behavior are explicitly separated. 

In order to illustrate how averaging helps in simplifying noise analysis, let’s 
assume that, for a fixed p, hk{p,t) is a Gaussian, T-periodic cyclostationary noise 
source with its autocorrelation equal to ^k (p, U,t2) = {hfc(p, U)h^ (p, ^2)}- Us- 
ing results from cyclostationary noise theory [8], it is then possible to show that, 
with Ma; [•] being the ideal low-pass filter defined in (15), h(p, t) = [h^] (p, ^) 

is a Gaussian, stationary noise process with its autocorrelation determined by 



^k(p,T) 



sin(7rr/T) 

7TT 




T 

2 ’ 



^ + 2 ^ 



(16) 



where r = t 2 — ti. This result is readily verified by considering it from a frequency- 
domain point of view. Typically, this reduction from cyclostationary to stationary 
noise processes greatly facilitates further analysis. An interesting application of this 
result involves solving the oscillator phase noise equation. 



Applications and Results 

Phase Noise Analysis 

Modeling oscillator phase noise behavior, as discussed in [3,4,6], turns out to be a 
special case of the method presented in this paper. Here, the noise-free oscillator 
corresponds with the unperturbed system while the noise sources are associated 
with the perturbation part. Assuming the oscillator to be mono-stable, the manifold 
M, spoken of in Sect. 6, coincides with the oscillator’s orbit in the phase space. 
The solutions of the unperturbed system that are located on this 1-dimensional 
manifold, are characterized as 



{t, p) Xo5c (t + d) 



(17) 



Here, Xosc(t) is a T-periodic solution of the noise- free oscillator equations while p = 
is a 1-dimensional parameter vector. The parameter represents the oscillator’s 
phase, i.e. a measure for the position of oscillator’s initial state xq = X 5 (0, p) = 
Xosc (^) on its orbit in the phase space. 

Given a noise source n{t) and using the perturbation techniques outlined in 
Sect. 6, we obtain a phase noise equation that is structured as 



^=ev{t + m)n{t) . 



(18) 
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where v{t) is T-periodic. This phase noise equation describes the oscillator’s stochas- 
tic phase behavior, i.e. its random motion along its own orbit. 

In what follows, we assume n{t) to be a Gaussian, stationary noise source with 
autocorrelation ^(r). We furthermore impose that either ^(r) ^ 0 on the scale 
T/e or that the spectrum of is mainly contained within the frequency band 
[— 1/2T, 1/2T]. The results in Sect. 6 then allow us to prove that the solution of 
(18) is, up to zero-th order in e, equal to that of the averaged equation 



dt 



= en{t) 



(19) 



Here, n{t) is a Gaussian, stationary noise source with its autocorrelation determined 

by 



^{t) « J ^v(t + 'd+ ^)v{t + »? - ^)${T)dt 

T 

« i J ^v{t + ^)v{t - ^)${T)dt . (20) 



Note how the dependence on d disappears due to the periodicity of v{t). This 
averaged noise signal n{t) turns out to be the stationary component of the cyclo- 
stationary noise signal v{t)n{t). 

With cyclostationarity being removed, (19) is much easier to solve than (18). 
The results obtained in this way correspond with those in [4] but avoid the necessity 
to construct the often complicated (modified) Fokker-Planck equations. 



Behavioral Modeling of Harmonic Oscillators 

Constructing behavioral models for (coupled) harmonic oscillators [9] constitutes 
another useful application of the method discussed above. These behavioral models 
can, for example, be used for system-level simulations and trade-off analysis. 

For a system of K coupled harmonic oscillators, the K lossless resonance tanks 
can be considered as corresponding to the unperturbed system. Currents coming 
from the feedback transistors are associated with the perturbations. This parti- 
tioning is illustrated in Fig. 1 for a single harmonic oscillator. The manifold M, 
containing the steady-state solutions of the unperturbed system, therefore is 2K- 
dimensional, i.e. two degrees of freedom per resonance tank. A compact behavioral 
model can now be obtained using the averaged ODE (13) with its right-hand side 
approximated through, for example, polynomial regression or table lookup and in- 
terpolation. The advantages of such a behavioral model is twofold. Firstly, the 
averaged ODE can be integrated using a time step that is proportional to T/e com- 
pared to a time step T needed for solving the original circuit equations. Secondly, 
evaluating the approximated right-hand side involves a computational cost that 
well below the one corresponding with evaluation of a set of complex transistor 
equations. 
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The behavioral model extraction algorithm was implemented in Mat lab. For 
the harmonic oscillator in Fig. 1, we constructed a behavioral model based upon 
table lookup and interpolation of the averaged ODEs (13). The details of the model 
extraction algorithm are outlined in [9]. Constructing the model takes 3 minutes 
of CPU time. Fig. 2 compares the startup behavior of the harmonic oscillator as 
predicted by the behavioral model (envelope signal) with results from SPICE-like 
integration of the original set of circuit equations. As can be seen, results are in 
excellent agreement (within 1%). The behavioral model, however, evaluates 2 orders 
of magnitude faster as compared to the SPICE-like simulations, i.e. 0.4 seconds 
versus 37.2 seconds for a simulation run over 40 periods of oscillation. Use of the 
behavioral model hence significantly speeds up simulations which is, for example, 
useful for lengthy and repetitive simulations as in system-level verification and 
trade-off analysis. 




FT 

Fig. 2. Oscillator startup behavior. The fast-oscillating line is computed by solving 
the original set of circuit equations using a SPICE-like algorithm. The envelope is 
computed by solving the behavioral model equations. 



Conclusions 



In this article, we have discussed a method for modeling and analysis of perturbed 
oscillator behavior. This method deals with perturbations that can be both deter- 
ministic, e.g. an amplitude-regulating mechanism as in harmonic oscillators, and 
stochastic, e.g. noise sources causing oscillator phase noise. The method is founded 
upon perturbation theory and averaging. The latter allows us to explicitly separate 
the fast- varying and the slow- varying components of the oscillator’s behavior. Use of 
this method was illustrated for oscillator phase noise analysis and the construction 
of compact behavioral models for harmonic oscillators. 
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Abstract. This paper presents an analytical approach for calculating fringing per- 
meances in gapped inductors. For most of the gapped inductors, the permeance of 
other field paths out of the air gap (the fringing paths) is not negligible. Existing 
three-dimensional modelling techniques using finite element analysis for magnetic 
components are accurate, but require prohibitive amount of simulation time. Two- 
dimensional models are often used, but the accuracy is low as a 2D simulation fails 
taking into account important 3D effects. We present analytical approximations for 
fringing permeance calculation for the most usual field patterns, denoted as basic 
cases. The proposed fringing coefficients can be used to present all symmetrical 
cases and cases with multiple air gaps. The derived equations are sufficient for a 
normal engineering accuracy. 



Introduction 

Gapped inductors are widely used in power electronics equipment. In most of the 
designs the additional inductance because of the fringing flux path around the air 
gaps is important and should be taken in account. 

There are many discretisation methods such as Finite Difference Method (FDM), 
Finite Difference in Time Domain (FDTD), Finite Elements Method (FEM), Bound- 
ary Element Method (BEM) which allow an accurate presentation and calculation 
of the field in a gapped core inductor. The advantages of the numerical methods are 
the flexibility for modelling irregular field geometry and boundaries and the possi- 
bility to handle non-linear ly of the material. The air gap effects have been discussed 
by discretisation methods in several papers [1-3] and 2-D and 3-D solutions have 
been given. An improved computer-aimed optimisation of inductor design consider- 
ing air gap effects has been proposed in [4]. Other general methods for calculating 
gapped inductors are presented in [5-8], all of them having their advantages and 
disadvantages. Analytical solutions based on the Schwartz-Christoffel transforma- 
tion [9] have been proposed. Although the mathematical base is promising, the 
accuracy is low as a field problem is solved where the conductors are placed far 
away from the air gap, which is usually not the case in practical arrangements. The 
fringing effects are also important for the design of electrical machines and the cor- 
rection factors and slotting factors (Carter factor) have been discussed analytically 
in [10,11]. The analytical methods have their advantages: possibility of generating 
diagrams, solving the reverse problems and optimising more complex problems. 
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Neglecting the fringing flux one could use a simple equation for an inductance of a 
coil: 



L = no N^ + 



A, 

Eh+l m /Hr 



( 1 ) 



where: 

— Slgi sum of the air gap lengths in the flux path; 

— Agi cross section of the air gap, equal to core cross section; 

— Im'^ length of the flux path in a core; 

— fir> permeability of the core material; 

— N: number of turns. 

For large air gaps, the permeance of other field paths (fringing paths) out of air 
gap is not negligible. This result in much larger values forL than predicted by 
(1). In almost all designs of gapped inductors for power electronics the fringing 
field should be considered and thus the expression (1) gives a poor approximation. 
A better approximation for a centre gapped UU and centre gapped EE cores is 
McLy man’s formula [12]: 



L'=LXf, Xf=l + -^ In(^) 

^yAg tg 



( 2 ) 



in which 

— L': inductance corrected for fringing; Xfi fringing factor; 

— w: total height of the winding; Igi air gap length. 

The expression (2) does not make difference between air gaps with rectangular or 
round cross- section and is limited to small air gaps. 

In this paper, to obtain a better accuracy we proposed the following approach: 

— First, investigate and provide basic analytical approximations for the fringing 
coefficient for several basic cases of different air gap configurations. 

— Secondly, these results will be used to solve usual cases. 

The derived fringing coefficients are compared and tuned using FEM calculations. 
The proposed approach is valid for round wires or litz wires. The extension of the 
proposed approach to 3-D is given in the next work of the authors [13]. 



Proposed Analytical Approach: Basic Cases 

It is clear that, in a 2D problem, the fringing permeance is proportional to the third 
dimension. The fringing permeance leads mainly to a correction on the permeance 
of the air gap and it is of a little influence to the main permeance. 

The permeance of an air gap is: 

Ag = flo -jA -f- Ho Cg F 



(3) 
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where 

— Ag : permeance of the air gap; 

— Agi surface of the air gap; 

— Cgi the third dimension; 

— Igi total length of air gap; 

— F: fringing coefficient (see later). 

We will derive the expressions for the fringing coefficients for the basic cases. 

Basic case 1 

In the basic case 1 , the conductors are surrounded by magnetic material except at 
the air gap (see Fig. 1). The permeability of the magnetic material is assumed to 




Fig. 1. Magnetic field in the basic case 1 (conductors surrounded by a core), 
be infinite. We propose the following approximation for the fringing coefficient F \ : 



Fi (d, c, h) 



2 ,f7 + l.,ih-dy{h-0.26d-0.5c) 

_ *“V 1,1^+ 1„U2 




( 4 ) 



in which 

— Fi : fringing coefficient for the basic case 1; 

— d: air gap distance to the reference; 

— c: thickness of winding; 

— h: height of the winding compared to the reference plane. 

The tuning constants (0.26 and 0.5) have been fitted, using the software Finite 
Element Method Magnetics (FEMM3.1) [14]. The normalised values of Fi (h=l), 
obtained by the proposed approximation (4) and by FEM simulation, are shown in 
Fig. 2(a). 

Fig. 2(b) shows the deviation between the proposed approximation and the finite 
element simulation. The error is below 3% (the relative error on F\). 
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Fig. 2. (a) Fringing coefficient F\ of (4) as a function of d; h=l; c is a parameter; 
Ffemmi is the values found by FEM simulation, (b) Ratio of the values of Fi 
obtained by the proposed approximation and by FEMM; h=l] c is a parameter. 

Basic case 2 

In the basic case 2, the winding touches the core but all other sides of the winding 
are surrounded by air, see Fig. 3. In this case the fringing field is also mainly 
concentrated near the air gap, the field lines are almost parts of circles. We propose 
the following expression for the fringing coefficient F 2 : 



2 , ,0.44(/i^ +c^) -0.218 + 0.67 cd + 0 . 33 ft c + 0.7825 1 / 2 ,,, 

F2(d,c,h) = - ln[ ^ — J (5) 

7T 



For small d, the expression (5) is symmetrical in respect to c and h. The normalised 
values of F'i{h=l), obtained by the proposed approximation (5), and by the FEM 
simulation, are shown in Fig. 4(a). Fig. 4(b) shows the deviation between the pro- 
posed approximation and the finite element simulation results. The matching is 
about 1% (relative error on F 2 ). 

Note that basic case 1 and basic case 2 are almost equal for c = h. The reason for 
this is that no m.m.f. (magneto-motive force) is present and thus, there is not much 
field far away from the air gap so that the presence of material does not infiuence 
much the result. 



Basic case 3 

The basic case 3 represents a problem, where no conductors are present (see Fig. 
5 and the m.m.f. can be put in the air gap. This is the case of the outside legs 
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Fig. 3. Magnetic field in the basic case 2 (conductors in open area). 

(for instance EE, ETD, ER cores). The total height is now larger than the winding 
height, so we use p, instead of h. The fringing coefficient is: 

Fzid^g) = tacosh[3.395(^)^ + 0.15 + 1.1155] (6) 

7T da 

For better interpolation in the expression (6) we use the function acosh instead of 




Fig. 4. (a) Fringing coefficient F 2 of (5) as a function of d; h = 1; c is a parameter, 
(b) Ratio of the values of F 2 , obtained by proposed approximation and by FEMM 
as a function of d,; h = 1; c is a parameter. 
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Fig. 5. Magnetic field in the basic case 3. 



the logarithmic function. 

Fig. 8 shows the proposed approximation (6) and the finite element simulation 
results. The matching is better than 1% (relative error on Fs). 




Fig. 6 . Fringing coefficient Fs of (6) as a function of d; ^ = 1. 



Basic case 4 

The basic case 4 represents a top to bottom problem, without conductors (Fig. 7). 
This field pattern occurs when the yoke-to-yoke m.m.f. is not zero. The fringing 
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Fig. 7. Magnetic field of the combination of the basic cases 3 and 4. 
coefficient is: 



F4(a,5) = tacosh[1.4(-)°-®® + l] (7) 

7T g 

Figure 8 shows the fringing coefiicient F4 as a function of a; g=l. From Fig. 8 it 
is seen that when a is small the value of F 4 decreases and almost vanishes. The 
cases 3 and 4 are not independent. We did subtract the case 3 to find the case 4. 
We did see that the result for case 4 was quite independent of d in a range for 
d = 0^ . . . j 0.75p, which confirms that the separation is possible. 



Application of the obtained fringing coefficients 

By using the already obtained fringing coefficients describing the basic cases, we 
can present the fringing field in symmetrical cases. 

As an example, let us consider a symmetrical case representing a wound centre leg 
core with an air gap in the middle, each side has a field pattern identical to the 
case 1. In that case d = Ig/ 2 . The permeance of the air gap should be divided by 
2. Thus, the fringing coefficient Fis is: 



Fu{lg,c, w) 



Fife/2,c,w/2) 

2 



( 8 ) 



The proposed approach and approximation can be extended from 2D to 3D. Those 
calculations can take in account the specific effects as corner effects. That extension 
of the topic is out of the scope of this paper and the problem is discussed in the 
next works of the authors. 




406 



Vencislav Valchev et al. 




Fig. 8. Fringing coefficient F4 as a function of a; = 1. 

Conclusion 

A quite general approach to calculate 2-D fringing permeances of gapped inductors 
has been proposed. Analytical expressions for the fringing coefficients of four basic 
cases are derived and verified by finite element calculation. The proposed fringing 
coefficients can be used to present all symmetrical cases of gapped inductors as 
well as multiple air gap cases. A typical accuracy of 1, . . ., 2 % can be obtained for 
the fringing permeance by the proposed equations. For small air gaps the obtained 
accuracy usually exceeds the usual accuracy of the physical dimensions of the cores. 
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Abstract. In this paper we describe how stochastic differential-algebraic equations 
(SDAEs) arise as a mathematical model for network equations that are influenced 
by additional sources of Gaussian white noise. We give the necessary analytical 
theory for the existence and uniqueness of strong solutions, provided that the sys- 
tems have noise-free constraints and are uniformly of DAE-index 1. We express 
these conditions in terms of the network-topology for reasons of use within a cir- 
cuit simulator. In the second part we analyze discretization methods. Due to the 
differential-algebraic structure, implicit methods will be necessary. By the examples 
of the drift-implicit Euler and Milstein schemes we show how drift-implicit schemes 
for SDEs can be adapted to become directly applicable to stochastic DAEs and 
prove that the convergence properties of these methods known for SDEs are pre- 
served. For illustration we apply the drift-implicit Euler scheme to an oscillator 
circuit. 



Problem Formulation 

The charge-oriented Modified Nodal Analysis (MNA) represents a standard tool 
in circuit simulation. The equations are generated automatically by combining the 
network topology, Kirchhoff’s Current Law, and the characteristic equations de- 
scribing the physical behaviour of the network elements. This results in large sys- 
tems of DAEs, whose special structure was analyzed in a number of papers, e.g. 
[5,6,16]. Due to decreasing signal to noise ratios in special applications linear noise 
analysis around the deterministic solution is no longer satisfactory. The noise influ- 
ences such systems in an essentially nonlinear way, the solution of the deterministic 
models differs from the mean of the solution of the stochastic models. Transient 
noise analysis is necessary. Here we deal with models where Gaussian white noise 
sources are added to the systems. Thermal noise of resistors and shot noise of in- 
junctions are modelled as external Gaussian white noise sources in parallel to the 
original element (see Figures 1 and 2), as will be explained next. 

Nyquist’s theorem (see e.g. [2-4,18]) states that the current through an arbitrary 
linear resistor having a resistance R, maintained in thermal equilibrium at a tem- 
perature T, can be described as the sum of the noiseless, deterministic current and 
a current due to a Gaussian white noise process with spectral density Sth '= 
where k is Boltzmann’s constant. Hence, the additional current is modelled as 



Ith = CTth • ^(t) = 
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where is a standard Gaussian white noise process. In [17,18] a thermo-dynamical 
foundation to apply this model to mildly nonlinear resistors and reciprocal networks 
is given. 



thermal 
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of a 
resistor 
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shot 
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of a 

pn-j unction 



Figure 1 



Figure 2 



Shot noise of pn-junctions, caused by the discrete nature of current due to the 
elementary charge, is also modelled by a Gaussian white noise process, where 
the spectral density is proportional to the current I through the pn-junction: 
Sshot q\I\ , where q is the elementary charge. If the current through the pn- 
junction is described by a characteristic I = g{u) , where u is some voltage, the 
additional current is modelled by 



Ishot = (Tshotiu) ■ ^{t) = \/q\g{u)\ ■ ^{t) , 



where is a standard Gaussian white noise process. For a discussion of the model 
assumptions we refer to [2-4,17,18]. 

We represent the topology of a network by means of the incidence matrix (Ac, Ar, 
Al, Av, Ai, Aiv), with indices referring to branches of capacitances, resistances, 
inductances, possibly controlled voltage and current sources, and Un additional 
noise sources, respectively. Then the charge-oriented MNA system has the following 
structure (see [5,6] for the deterministic case): 



Acq + fi{e,jL,jv,t) + ^wdiag {a{Aj,e,t))^{t) = 0 (1) 

0' -A[e = 0 (2) 

Ave-Vs{e,jL,t) = 0 (3) 

q-qc{Ale,t) = 0 (4) 

(f>-(l>LijL,t)=0 , (5) 



where fi{e,jL,jv,t) := ARg{A'^e,t) + AlJl + Avjv + Aiis{e,jL,ji,t), and 
qc^ 9 ^(j>L^Vs^is^cr are given, noiseless functions. The vector of unknowns describ- 
ing the system behaviour consists of all node potentials e, the branch currents 
of current-controlled elements (inductances and voltage sources) and the 

charges q of capacitances, and the fluxes cf) of inductances. ^ denotes an tin- 
dimensional vector of independent standard Gaussian white noise processes. In 
industry-relevant applications one has to deal with a large number of unknown^ 
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and noise sources. 

The first block of equations (1) denotes a stochastic integral equation : 



Acq{s)\ + f fi{x{s),s)ds-\- f 
'^0 Jto dto 



^ivdiag (a)(x(s),s)dw(s) = 0 , 



where the ltd integral is employed and w denotes an niv -dimensional Wiener process 
(also known as Brownian motion) given on the probability space {Q^!F^P) with a 
filtration {Pt)t>to (see e.g. [7] for the stochastic background). A solution x = x{t^u) 
is now a stochastic process depending on the time t and the chance element a; G i?. 
The parameter uj is omitted in the notations above. The solution x{t) = x{t^ •) for 
fixed time t is a vector-valued random variable in L^(i?), a realization a:(*,a;) is 
called a path. 

The equations (l)-(5) form a specially structured Stochastic Differential Algebraic 
Equation (SDAE) of the type 



Ax{s) + f f{x{s),s)ds+ f G{x{s), s)dw{s) = 0 . 

^0 Jtn Jtc) 



( 6 ) 



which is the subject of this paper. A is a constant singular matrix, t varies over a 
compact interval J. The short-hand notation 



Ax\t) -f f{x{t)A) + G{x{t),t)^{t) = 0 



( 7 ) 



emphasizes the relations of (6) to its deterministic counterpart but may be mislead- 
ing for readers who are less familiar with the stochastic background. Though the 
notation x' {t) is used in (7), a typical realization x{'^uj) of the solution is nowhere 
differentiable. 

A process x{') — {x{i))tej is called a strong solution of (6) if it is adapted to the 
filtration (i.e., it does not depend on future information), and if, with probability 
1, its sample paths are continuous, the integrals in (6) exist and (6) is fulfilled. 



Index 1 SDAEs 

Due to the singularity of the matrix A the deterministic part of (6) 



Ax {t) -h f{x{t)A) = 0 , 



( 8 ) 



where the solution x is now a deterministic function of f, forms a DAE. Solutions 
have to fulfil the constraints of the equation. The solution components belonging 
to ker A (we call them the algebraic components) do not occur under the differ- 
ential operator d/dt, and the inherent dynamics live only in a lower-dimensional 
subspace. The DAE (8) is characterized as an index 1 DAE iff the constraints are 
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locally solvable for the algebraic components. Solving an index 1 DAE involves a 
coupling of an integration task and a nonlinear equation solving task. If a DAE is 
of higher index, the constraints are not locally solvable for the algebraic compo- 
nents, and there exist solution components that are determined only by a hidden 
differentiation step, which may cause serious difficulties in the numerical solution 
of such problems (see e.g. [1,9]). 

We assume here that the deterministic part (8) is globally an index 1 DAE in 
the sense that the constraints are regularly and globally uniquely solvable for the 
algebraic variables. The globally unique solvability is stronger than the determinis- 
tic index 1 condition, which requires only the non-singularity of the corresponding 
Jacobian and guarantees only local solvability of the constraints for the algebraic 
variables. The globally unique solvability holds for the MNA-system (l)-(5) if (see 

[19]) 

there are no loops of capacitances and voltage sources and no cut-sets of 
inductances or current sources, the capacity, conductance, and inductance 
matrices are symmetric and uniformly positively definite, 
and the controlled sources fulfil certain conditions described in [5]. 

In [13,14] it is shown that special conditions are needed to ensure solution processes 
that are not directly effected by white noise. Then the SDAEs are called SDAEs 
without direct noise, otherwise with direct noise. To avoid a solution process that 
is directly affected by white noise we have to assume that the noise sources do not 
appear in the constraints. This means that 

im G{x, t) Cim A V(x, t) G x J . 

This is true for (7) if and only if there are always capacitances in parallel to a noise 
source. This is quite restrictive in the actual noise modelling (see also the example 
in Section 4). Nevertheless, one can also handle many situations where this condi- 
tion is violated. Often noisy constraints are only needed for the determination of 
algebraic solution components that do not interact with the dynamical ones. Future 
work should be dedicated to classify such situations. 

Under these conditions the constraints of the SDAE can be described by the deter- 
ministic equation 

Rf(x{t),t) = 0 , 

where i? is a projector along imyl, i.e., = R, ker R = im A. Solving the con- 

straints for the algebraic components 

Rf{u -f u, t) = 0 , Au = 0 V = v{u, t) , 

inserting the result into the differential equations, and scaling the system by a 
Pseudo-inverse A~ (with AA~ = I — R,A~A a projector along ker A) leads to a 
so-called inherent regular SDE in the differential components u: 



u A /(u -f u(u, t), t) -h A G(ti + u(iA, t), t)^(t) = 0 



(9) 
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It can be shown that (9), together with x{t) = u{t) + v{u{t)^t ) , is theoretically 
equivalent to (6). Based on this, the following theorem concerning the existence 
and uniqueness of solutions of (6) is proved in [19]: 



Theorem 1. Let the above conditions be fulfilled for (6), and suppose that f and 
G are globally Lip schitz- continuous with respect to x , continuous with respect to t, 
and that Ax^ is TtQ -measurable, independent of the Wiener process w, and 
with finite second moments. 

Then there exists a strong solution x{-) of the initial value problem 



Ax{t) — Ax^ f f{x{s),s)ds-{- f G{x{s),s)dvu{s) = 0 
Jto Jto 



( 10 ) 



which is pathwise unique. Moreover, the solution x{t) is square- integrable. 



Discretization Schemes for Index 1 SDAEs 

Starting with the Euler Maruyama scheme [8] a wide spectrum of numerical meth- 
ods for SDEs has been developed. However, first decoupling the SDAE numerically 
and then applying a scheme to the resulting inherent SDE would be an inefficient 
procedure in general. We aim at numerical methods for SDAEs that should work 
directly on the given implicit structure, as in the case of deterministic DAEs. Only 
little previous work has been done in this direction. In [13,14] linear SDAEs are 
analyzed and the convergence of the drift-implicit (i.e., implicit in the determin- 
istic part, but explicit in the stochastic part) Euler scheme is proved. In [11,12] a 
scheme with strong order 1 is developed for the specially structured SDAEs arising 
in transient noise simulation for electronic circuits. Later we will point out its rela- 
tion to the drift-implicit Milstein scheme. 

Our approach applies to nonlinear SDAEs too, and is used here to analyze the drift- 
implicit Euler scheme, and to derive and analyze a drift-implicit Milstein scheme 
for SDAEs. Designing the methods such that the iterates Xi have to fulfil the con- 
straints of the SDAE at the current time-point ti is the key idea to adapt known 
methods for SDEs to (6). 



Drift-Implicit Euler Scheme 

On the deterministic grid 0 = to < t\ < ... < tN = T the drift-implicit Euler 
scheme for (6) is given by 



+ G{xt-i,tt-x)Y^Awi = 0 , 



( 11 ) 



where hi — — ti , Awi — vo{ti^\) — w{ti). Realizations of Awi can be 

simulated as ^"(0, h^/)-distributed random variables. 

The scheme (11) for the SDAE (6) possesses the same convergence properties as 
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the drift-implicit Euler scheme for SDEs. In general, its order of strong convergence 
is 1/2, i.e.. 



€£ := - Xi\\L 2 = (E\x{ti) - < c • 

holds for the mean square norm of the global errors. For additive noise, i.e., G{x^ t) = 
G(t), the order of strong convergence is 1, for small noise, G{x,t) = eG{x,t)^ the 
error is bounded by 0{h -f (see [10] for related results). 



Drift-Implicit Milstein Scheme 

We intend to design this method in such a way that it realizes the drift-implicit 
Milstein scheme for the inherent SDE u -f /(u, t) -h G{u^ = 0 ? i*e.. 



j=i 

where // = ^ /(* dwi{T)dwj{s) , G = {gi, . . . ,gm) and f(u, t) ;= 

A~ f{u -f u(u, t), t) , G('u, t) := A~G{u -f- u(u, t), t) . 

The Milstein scheme possesses strong convergence of order 1. It differs from the 
Euler scheme by an additional correction term for the stochastic part, which in- 
cludes double stochastic integrals. For additive noise the additional term vanishes 
and both schemes coincide. 

The Milstein scheme for the inherent SDE is realized by 



A 



X£-X£-l 

h£ 



+ f(x£^ ti) G{X£-1^ t£-l) 



Aw£ 



m j£ 

'^(gj)xXuA^ G{xi-i,tt-i)i^ = 0 , 



where G = (^i, . . . , ^rn), which we call the drift-implicit Milstein scheme for (6). 
The expression XuA~ in the latter term can be reformulated (cf. [15]) as 

XuA- = {I + Vu)A- = {A + RU)-\l-R) = (A + hU)-\l-R) + 0{h). 

Without changing the order of the scheme the partial derivative which is gen- 
erally not explicitly known, can be substituted reusing the Jacobian of the system. 
Penski’s approach [11,12] results in a similar approximation to the Milstein scheme 
in a more specialized setting. The higher order 1 of strong convergence of these 
schemes has to be paid for by the use of a large number of double stochastic inte- 
grals and the use of the derivatives of the diffusion coefficients. In an application 
with a large number of small noise sources one has to pay much for a mostly theo- 
retical gain in accuracy. 
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Numerical Example 

We have applied the drift-implicit Euler scheme to simulate a ring-oscillator with 
simple mosfet-models which was used also in [11]. The equivalent circuit diagram 
is given in Fig. 3. 




Figure 3: Thermal noise sources in a mosfet ring-oscillator model 



The variables in the MNA system are the charges for the six capacities, the four 
nodal potentials and the current through the voltage source. The system fulfils 
the assumptions made in Chapter 2 only partly. It is of index 1, but clearly the 
three thermal resistance noise sources directly affect the current through the volt- 
age source. But, the direct noise occurring in this variable is harmless in the sense 
that this variable does not infiuence others. Omitting this critical variable together 
with the nodal equation for node 4 would lead to a system without direct noise. 

In this simple model nearly no differences between the solutions of the noisy and 
the deterministic problem could be seen. Therefore, we dealt with a system where 
the diffusion coefficients had been scaled by a factor thousand. 




Figure 4: a sample path of the voltage in node 1 (el) compared to the mean over 
100 sample paths (E el) and the noiseless voltage (det el) 



In Fig. 4 we present numerical results obtained with the drift-implicit Euler scheme 
and fixed stepsize h = 10“^^ . We plotted the nodal potential at node 1. The mean 
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of the computed 100 sample paths differs considerable from the noiseless solution. 
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